Some history: idea has been around since 1940s; in 1990s SVMs (with kernel trick to handle non-linearly separable datasets) shown to be far better; ~2011 neural networks (using tweaked techniques and powerful computers) started outperforming SVM.

Basic idea:

Idea is to mimic a neuron in the brain; basic components are dendrites, a nucleus, an axon, and axon terminals.

Neurons transmit info via synapse between the dendrites of one neuron and the axon terminals of another neuron.

How computer scientists mimic this:

The circles (i.e., nodes) represent neurons and perform functions on the data.

Each “column” is a layer; the first layer is the input data layer. Unless the output is the input, you have at least one “hidden” layer. The more layers you have, the “deeper” your network is.

The lines connecting the nodes are weighted; initially, weights are just random. We have to learn what the best weights are by comparing the machine’s predictions to training data (i.e., actual outputs).

Keep weighting and summing at each layer. Then “activation” function decides whether final sum maps to 0 or 1 (or whatever your class values are), often dependent upon some threshold.

More detail:

- Input layer x (i.e., the non-decision attribute data)

- Arbitrary number of hidden layers

- Output layer y ^ (i.e., the decision attribute in the dataset)

- Set of weights and biases between each layer, W’s and b’s

- Activation function for each hidden layer, σ (most common is Sigmoid activation function)

A 2-layer NN with 3 attributes and a hidden layer of 4 nodes conceptually looks like:

The output of a 2-layer NN is: y ^ = σ(W2 σ(W1x + b1) + b2)

Finding the best values for the weights and the biases determines the strength of the predictions; the process of fine-tuning those parameters from the training data is called training the neural network.

Each iteration of the training process consists of the following steps:

- Calculating the predicted output y^ ; this is called feedforward

- Updating the weights and biases; this is called backpropagation

- Determining if we can improve the weights and biases by minimizing a loss function

Feedforward: calculate y ^ = σ(W2 σ(W1x + b1) + b2)

Loss function: can use sum-of-squares-error which is difference between each predicted value (y^ ) and actual value (y) for each instance i = 1..n

Implementation in Python (doesn’t use bias)

```
#import numpy libraries
import numpy as np
def sigmoid(mmatrix):
return(1/(1+np.exp(-mmatrix)))
def sigmoid_derivative(mmatrix):
s = sigmoid(mmatrix)
ds = s*(1-s)
return ds
class NeuralNetwork:
def __init__(self, x, y):
self.hiddenLayerSize = 4 # hard-coded for this example
self.input = x
self.weights1 = np.random.rand(self.input.shape[1],
self.hiddenLayerSize)
self.weights2 = np.random.rand(self.hiddenLayerSize,1)
self.y = y
self.output = np.zeros(self.y.shape)
def feedforward(self):
self.layer1 = sigmoid(np.dot(self.input, self.weights1))
self.output = sigmoid(np.dot(self.layer1, self.weights2))
y^ = σ(W2 σ(W1x))
def backprop(self):
# Apply the chain rule to find derivative of the loss
# function wrt weights2 and weights1
d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output)
* sigmoid_derivative(self.output)) )
d_weights1 = np.dot(self.input.T,
(np.dot(2*(self.y - self.output) *
sigmoid_derivative(self.output), self.weights2.T) *
sigmoid_derivative(self.layer1)))
# Update the weights with the derivative (slope) of
# the loss function
self.weights1 += d_weights1
self.weights2 += d_weights2
```

Example:

Implementation

```
x = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])
y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork(x, y)
for iterations in range(2000):
nn.feedforward()
nn.backprop()
print(nn.output)
```

Library Implementation in Python

The ANN library is called Keras

Requires that you install the Python TensorFlow library (see https://www.tensorflow.org/install/pip), which requires that you are running a 64-bit version of Python How to see what platform of Python you’re running:

```
import platform
platform.architecture()
```

Approach is to create a model (called a Sequential model), adding 1 layer at a time until you’re happy with it

Each connected layer is defined (using a Dense class), specifying # nodes and activation function

Rectified Linear Unit Activation function (relu) is commonly used (so is Sigmoid function (sigmoid)); relu is simpler and helps models learn faster; sigmoid typically used on last layer to make sure output is between 0 and 1 if class is binary.

Ex: Suppose we have a dataset that has 4 attributes plus a ternary decision attribute.

```
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
dataset = np.loadtxt('iris.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:4]
y = dataset[:,4]
# define the keras model
model = Sequential()
model.add(Dense(4, input_dim=4, activation='relu'))
model.add(Dense(3, activation='sigmoid'))
```

1 st hidden layer has 3 nodes (arbitrary decision) and uses the relu activation function Output layer has 1 node (class) and uses the sigmoid activation function.

Next, we compile the model; this is where it uses the numerical libraries of TensorFlow and determines the best way to run it on your hardware (i.e., CPU, GPU, distributed)

Parameters:

loss parameter specifies how you want loss to calculated;

binary_crossentropy is good to use when class is binary; see the following

link for other choices including mean_squared_error: https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/

optimizer parameter specifies the gradient descent algorithm (i.e., how to update the weights); the adam algorithm is a bit different than the classic way and has a few benefits (see https://machinelearningmastery.com/adamoptimization-algorithm-for-deep-learning/)

metrics parameter specifies what you want to collect and report (e.g., accuracy)

Ex: #

compile the keras model

```
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
```

Now train the model on some data

Training occurs over epochs and each epoch is split into batches

Epoch: one pass through all rows in the training dataset

Batch: one or more samples considered by the model within an epoch before weights are updated

Ex: #

fit the keras model on the dataset (FYI: 150 rows in this dataset)

`model.fit(X, y, epochs=500, batch_size=100)`

Ready to evaluate the model (i.e., how well it did on training dataset)

Ex: #

evaluate the keras model on the dataset

```
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
```

Finally, can use the model to make predictions

Ex: #

make class predictions with the model

```
predictions = model.predict_classes(X)
# summarize the first 25 cases
for i in range(25):
print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))
```

Thanks For Visit Here!

If you need any help related to Neural Network, Python Machine Learning, R Programming, and other related help then send your request and get instant help with an affordable price.

Contact Us!

realcode4you@gmail.com

## Comments