top of page

Implement a Matrix Based Dense Neural Network for Character Recognition on the MNIST Dataset

Implement a matrix based Dense Neural Network for character recognition on the MNIST dataset. The initial architecture you will program is shown below.

a) Show the matrix equations for the forward pass i..e., for computing S1, a1, S2 and a2

b) Show the matrix equations (partial derivatives) for the backpropagation algorithm i.e., for computing:

δ2, δ1, ∇𝑤2, ∇𝑏2, ∇𝑤1, ∇𝑏1

c) Program the backpropagation algorithm for training and testing on the MNIST dataset. The dataset is provided to you on the CPEG 586 web site. You need to use only 1000 images for training and all 10000 images for testing. Once you download the data folder, there will be three subfolders in it called Training1000, Test10000, and TrainingAll60000.

The Python code for reading the training and testing images appears as (you will change the folder names in the code below to match your folder where you unzipped the data):

Note: does a dot product (or inner product) if the two arguments are arrays, but it does a matrix multiplication if the arguments are multi-dimensional arrays. Similarly, np.multiply does an element by element multiplication, but the regular * also does an element by element multiplication on arrays or vectors, so you do not have to use np.multiply.

import os  
import sys  
import cv2  
import numpy as np  
from sklearn.utils import shuffle 
train = np.empty((1000,28,28),dtype='float64')
 trainY = np.zeros((1000,10,1))
 test = np.empty((10000,28,28),dtype='float64')
 testY = np.zeros((10000,10,1))
 # Load in the images
 i = 0
 for filename in os.listdir('D:/Data/Training1000/'):
	y = int(filename[0])
	trainY[i,y] = 1.0
	train[i] = cv2.imread('D:/Data/Training1000/{0}'.format(filename),0)/255.0 
	#for color, use 1
	i = i + 1
 i = 0 # read test data
 for filename in os.listdir('D:/Data/Test10000'):
	y = int(filename[0])
	testY[i,y] = 1.0
	test[i] = cv2.imread('D:/Data/Test10000/{0}'.format(filename),0)/255.0 
	i = i + 1
 trainX = train.reshape(train.shape[0],train.shape[1]*train.shape[2],1)
 testX = test.reshape(test.shape[0],test.shape[1]*test.shape[2],1)

Try to program the training and testing code yourself. If you are having difficulty, then use the following skeleton code:

numNeuronsLayer1 = 100  
numNeuronsLayer2 = 10  
numEpochs = 100  
w1 = np.random.uniform(low=-0.1,high=0.1,size=(numNeuronsLayer1,784))  
b1 = np.random.uniform(low=-1,high=1,size=(numNeuronsLayer1,1))  
w2 = np.random.uniform(low=- 0.1,high=0.1,size
b2 = np.random.uniform(low=-0.1,high=0.1,size=(numNeuronsLayer2,1)) 
learningRate = 0.1;  
for n in range(0,numEpochs):  
	loss = 0  
	trainX,trainY = shuffle(trainX, trainY) 
	# shuffle data for stochastic  behavior  
	for i in range(trainX.shape[0]):  
		# do forward pass  
		# your equations for the forward pass    
		# do backprop and compute the gradients * also works instead   
		# np.multiply  
		loss += (0.5 * ((a2-trainY[i])*(a2-trainY[i]))).sum()  
		# loss += (0.5 * np.multiply((a2-trainY[i]),(a2-trainY[i]))).sum()    
		# your equations for computing the deltas and the gradients   
		# adjust the weights  
		w2 = w2 - learningRate * gradw2  
		b2 = b2 - learningRate * gradb2  
		w1 = w1 - learningRate * gradw1  
		b1 = b1 - learningRate * gradb1      
	print("epoch = " + str(n) + " loss = " + (str(loss)))     
print("done training , starting testing..")  
accuracyCount = 0  
for i in range(testY.shape[0]):  
	# do forward pass  
	s1 =,testX[i]) + b1  
	a1 = 1/(1+np.exp(-1*s1)) 
	# np.exp operates on the array  
	s2 =,a1) + b2  
	a2 = 1/(1+np.exp(-1*s2))  
	# determine index of maximum output value  
	a2index = a2.argmax(axis = 0)  
	if (testY[i,a2index] == 1):  
		accuracyCount = accuracyCount + 1  
		print("Accuracy count = " + str(accuracyCount/10000.0)) 

d) The code above implements the Stochastic Gradient Descent (SGD) algorithm, i.e., the weights and biases are updated after each input pass. Implement the mini Batch Gradient Descent where after a specified batch size e.g., 10, the accumulated gradients are used to update the weights and biases.

e) Compare the performance of the mini-batch SGD, with the SGD and see which one produces better accuracy for different hidden layer Network sizes for number of Epochs equaling 25, 50, 100 and 150, and number of Neurons in the hidden layer to be 25, 50, 100 and 150. Graph the results using Matplotlib in Python. Also implement the tanh and RELU activation functions and experiment on the above cases of number of Neurons in the hidden layer and the number of epochs.

If you need any help related to deep learning project then w are ready to help you. You need to send your requirement details at:

Realcode4you team provide the full support to do any deep learning project or assignment with an affordable price.

102 views0 comments


bottom of page