Article on Medium: How to Build Neural Network from Scratch
This notebook can help you to understand how to build neural network from scratch.
Our neural network would have three layers:
- Input layer
- Hidden layer with 3 neurons
- output layer
All the layers and their parameters are hardcoded, which can be viewed as limitation, but for illustration purposes it's the ideal set up. The limitations for the network are following:
- We have predetermined input size. In our case it's two features, so input size is 2
- We have one hidden layer has 3 neurons, we cannot add more layers to the network
- We have predetermined output size, because we a working on regression problem
It is possible to change every hardcoded parameters manually, so I encourage you to play with the code, change parts, optimize it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow import keras
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import california_housing
California housing dataset
housing_data = california_housing.fetch_california_housing()
Features = pd.DataFrame(, columns=housing_data.feature_names)
Target = pd.DataFrame(, columns=['Target'])
df = Features.join(Target)
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[['MedInc','AveRooms','Target']])
X_train, X_test, y_train, y_test = train_test_split(
df_scaled[:10000, :2], # Features
df_scaled[:10000, 2:], # Target
class SimpleNeuralNetwork:
def __init__(self):
Initial weights and biases assigned random values ranging '0' to '1'.
We have a total of 9 weights and 4 biases.
6 weights are coming in the hidden layer, two for each neuron, hence 3 x 2 = 6.
The rest of the weights are coming into the output layer.
Same story for the biases. Each bias is attached to the neuron in the hidden layer
and output layer.
# Weights
self.w1, \
self.w2, \
self.w3, \
self.w4, \
self.w5, \
self.w6, \
self.w7, \
self.w8, \
self.w9 = np.random.rand(9)
# Biases
self.b_n1, \
self.b_n2, \
self.b_n3, \
self.b_y_hat = np.random.rand(4)
# Activation function for the neurons
# Each neuron IS an actually activation function itself
# sigmoid is for forward propagation, sigmoid derivative is for back propagation
def sigmoid(self, x): return 1 / (1 + np.e**-x)
def sigmoid_der(self, x): return self.sigmoid(x) * (1 - self.sigmoid(x))
Feedforward function produces result of the network prediction for each sample.
First we find neurons' values for hidden layer, then for output layer.
def feedforward(self, x):
# x[0], x[1] - features
# n* - neurons in the hidden layer, y_hat - predicted value
self.n1 = self.sigmoid(x[0]*self.w1 + x[1]*self.w2 + self.b_n1)
self.n2 = self.sigmoid(x[0]*self.w3 + x[1]*self.w4 + self.b_n2)
self.n3 = self.sigmoid(x[0]*self.w5 + x[1]*self.w6 + self.b_n3)
self.y_hat = self.sigmoid(self.n1*self.w7 + self.n2*self.w8 + self.n3*self.w9 + self.b_y_hat)
Backpropagation updates all the weights and biases of the network.
By using Gradient Descent technique, each trainable parameter (weight or bias)
is changing a little bit towards minimum of MSE.
Unlike forward propagation, we tweak our parameters starting
from the right end of the network, meaning first we update
weights and biases for output layer, then for the hidden.
If we had more than one hidden layer, we would go over them
is similar fashion,
like this: "output layer" => "hidden layer 2" => "hidden layer 1"
def backpropagation(self, x, y):
# We calculate some values here to use them later
y_hat_der = (-2 * (y-self.y_hat) * self.sigmoid_der(self.n1*self.w7 + self.n2*self.w8 + self.n3*self.w9 + self.b_y_hat))
n1_der = self.w7 * self.sigmoid_der(x[0]*self.w1 + x[1]*self.w2 + self.b_n1)
n2_der = self.w8 * self.sigmoid_der(x[0]*self.w3 + x[1]*self.w4 + self.b_n2)
n3_der = self.w9 * self.sigmoid_der(x[0]*self.w5 + x[1]*self.w6 + self.b_n3)
# Biases
self.b_n1 -= * y_hat_der * n1_der
self.b_n2 -= * y_hat_der * n2_der
self.b_n3 -= * y_hat_der * n3_der
self.b_y_hat -= * y_hat_der
# Weights
self.w7 -= * y_hat_der * self.n1
self.w8 -= * y_hat_der * self.n2
self.w9 -= * y_hat_der * self.n3
self.w1 -= * y_hat_der * n1_der * x[0]
self.w2 -= * y_hat_der * n1_der * x[1]
self.w3 -= * y_hat_der * n2_der * x[0]
self.w4 -= * y_hat_der * n2_der * x[1]
self.w5 -= * y_hat_der * n3_der * x[0]
self.w6 -= * y_hat_der * n3_der * x[1]
Training process is the combination of forward and back propagations.
def fit(self, X, y, epoch=10, lr=0.01):
mse_list = [] = lr
# Loop to go over epochs. Each epoch train network on all available data.
# We also check MSE and store it to visualize training process
for i in range(epoch):
mse = mean_squared_error(y, self.predict(X))
print(f'Epoch: {i+1} / {epoch}, MSE: {round(mse, 4)}', end='\r')
# Loop to go over each training example for current epoch
for j in range(len(X)):
self.backpropagation(X[j], y[j][0])
return mse_list
This function is very similar to feed forward,
it's in fact uses 'feedforward' function to make predictions.
The only difference is that we predict outcome for all samples.
def predict(self, X):
result = []
for x in X:
return result
model = SimpleNeuralNetwork()
history =, y_train, epoch=100, lr=0.05)
Epoch: 100 / 100, MSE: 0.0286
plt.rcParams['figure.dpi'] = 227
plt.rcParams['figure.figsize'] = (16, 6)'seaborn-whitegrid')
plt.title('Neural Network Training Process', fontSize=15)
plt.xlabel('Epoch', fontSize=12)
plt.ylabel('MSE', fontSize=12)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
keras_model = keras.Sequential()
keras_model.add(keras.layers.Dense(3, activation='sigmoid', use_bias=True))
keras_model.add(keras.layers.Dense(1, activation='sigmoid', use_bias=True))
Model: "sequential"
Layer (type) Output Shape Param #
dense (Dense) (None, 3) 9
dense_1 (Dense) (None, 1) 4
Total params: 13
Trainable params: 13
Non-trainable params: 0
history_keras =, y_train, epochs=100, verbose=0)
plt.rcParams['figure.dpi'] = 227
plt.rcParams['figure.figsize'] = (16, 6)'seaborn-whitegrid')
plt.title('Keras Neural Network Training Process', fontSize=15)
plt.xlabel('Epoch', fontSize=12)
plt.ylabel('MSE', fontSize=12)
y_pred_keras = keras_model.predict(X_test)
mse_keras = mean_squared_error(y_test, keras_model.predict(X_test))
print("MSE Simple:", round(mse, 3))
print("MSE Keras: ", round(mse_keras, 3))
MSE Simple: 0.028
MSE Keras: 0.028
print("R-squared Simple:", round(r2_score(y_test, y_pred), 3))
print("R-squared Keras: ", round(r2_score(y_test, y_pred_keras), 3))
R-squared Simple: 0.497
R-squared Keras: 0.502
Simple neural network has the same capabilities as the Keras analog, and was able to produce identical accuracy. That means it's working correctly and is efficient enough.
There are plenty of things we can do with our neural network, such as:
- Rewrite feedforward and backpropagation to the matrix form
- Make adding more than one layers and as many neurons as we want possible
- Add different activation functions, like Tanh or ReLu
- Add different cost function, like a binary cross-entropy for classification problems