Programming Neural Networks Using PyTorch

PyTorch is a popular library for programming deep neural networks. Its coding style is easy to follow. At the time of this writing, the latest version of PyTorch is 1.10. Before, we start programming in PyTorch, we need to create a Python environment and download the PyTorch library.


Installing PyTorch: Close Visual Studio if it is running. Launch the command prompt as an administrator. Then issue the following sequence of commands, one at a time.

conda create -n pytorch1x pip python=3.9 
activate pytorch1x 
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch 


The default environment will be shown in bold.


Then type the following code in the PyTorchTest.py


import sys 
import numpy as np 
import torch 

def main():  
	x = np.arange(100)  
	#----------- use gpu if available else cpu------------  
	device = 'cuda' if torch.cuda.is_available() else 'cpu'  
	print('Pytorch will use: ', device)  
	#----convert numpy to tensor-----  
	x_tensor = torch.from_numpy(x).float().to(device) 
	print(type(x_tensor)) 
if __name__ == "__main__":  
	sys.exit(int(main() or 0)) 


If you have GPU available on your machine, the output will appear as:






Otherwise, it will appear as:






You can still use PyTorch even if you do not have a GPU. As you can see from the above test code, the typical code to determine if you have a GPU or not is:

device = 'cuda' if torch.cuda.is_available() else 'cpu' 

The key concept in PyTorch is that the numpy arrays have to be to converted to PyTorch tensors, e.g.

x = np.arange(100) 
x_tensor = torch.from_numpy(x).float().to(device)

converts the numpy array x to x_tensor. The tensors will be computed on the CPU or the GPU.

To convert a tensor back to numpy:

x_cpu = x_tensor.cpu().numpy()

An important concept in PyTorch is the concept of DataSet and DataLoader. The dataset is responsible for providing access to the training and/or test data. The DataLoader makes converts the data from the dataset to a tensor and puts it on the CPU or the GPU depending upon the GPU availability. The DataLoader is written to return a batch of data.


PyTorch provides a library called AutoGrad” which computes the partial derivates of the loss function automatically for us. Further it provides a simple function to update the weights and biases. Typically, the following lines are used in the training loop.


model(x)                             # implicitly calls forward function   
loss.backward()                  # compute gradients   
optimizer.step()                 # update weights, biases  
optimizer.zero_grad()       # clear gradients 

Lets create a simple single neuron network in PyTorch. Create a Python application project called PyTorch1. Add a class called MyDataSet with the following code in it:


from torch.utils.data import Dataset, TensorDataset 
import torch 

class MyDataSet(Dataset):  
	def __init__(self, x_tensor, y_tensor):  
		self.x = x_tensor  self.y = y_tensor    
	def __getitem__(self, index):  
		return (self.x[index], self.y[index])  
	def __len__(self):  
		return len(self.x) 

As you can see from the above code, the DataSet stores the x_tensor and the y_tensor, and makes one item available to the caller via the getItem function.


Add a class called SimpleModel to the project with the following code in it.


import torch 
class SimpleModel(torch.nn.Module):  
	def __init__(self):  
		super().__init__()  
		self.w = torch.nn.Parameter(torch.randn(1, requires_grad=True,  	dtype=torch.float))  
		self.b = torch.nn.Parameter(torch.randn(1, requires_grad=True,  dtype=torch.float))    

	def forward(self, x):  
		return self.w * x + self.b

The above code creates single neuron network with two learnable parameters w and b that are randomly initialized. The forward function computes the wx + b for the neuron. As you can see from the above code, the variables in PyTorch are declared as torch.nn.Parameter. The requires_grad=True indicates that the gradient of the loss with respect to this variable will be computed.


Type the following code in the PyTorch1.py file. The following program first trains the simple model for learning the equation y = 2x + 0.4 without using a DataLoader. The second part of the program uses the DataLoader approach to train the single neuron network. Study the following code carefully to understand the PyTorch concepts.


import sys 
import numpy as np 
import torch 
import random 
from SimpleModel import SimpleModel 
from MyDataSet import MyDataSet 
from torch.utils.data import DataLoader 
from torch.utils.data.dataset import random_split

def main():  
	x = np.arange(100)  
	#----------- use gpu if available else cpu------------  
	device = 'cuda' if torch.cuda.is_available() else 'cpu'    
	#----convert numpy to tensor-----  
	x_tensor = torch.from_numpy(x).float().to(device)  
	print(type(x_tensor))  
	#----convert tensor back to numpy  
	x_cpu = x_tensor.cpu().numpy() 

	# first convert tensor to numpy  
	print(type(x_cpu))  
	#--requires_grad = True or False to make a variable trainable or not  
	w = torch.randn(1, requires_grad=True, dtype=torch.float).to(device)  
	b = torch.randn(1, requires_grad=True, dtype=torch.float).to(device)  
	print(w)  
	# We can either create regular tensors and send them to the device   
	a = torch.randn(1, dtype=torch.float).to(device)  
	b = torch.randn(1, dtype=torch.float).to(device)  
	# and THEN set them as requiring gradients...  
	a.requires_grad_()  
	b.requires_grad_(False)  
	print(a)  print(b)  
	# We can specify the device at the moment of creation - RECOMMENDED!  
	torch.manual_seed(42)  
	a = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)  
	b = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)  
	print(a)  
	print(b)  

	# methods that end with _ do inplace modification  
	# loss.bakward() to compute gradients  
	# .zero_() to zero out gradients  
	# .grad attribute to examine the value of the gradient for a given tensor  
	w = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)  
	b = torch.randn(1, requires_grad=True, dtype=torch.float, device=device)  
	lr = 1e-2  
	n_epochs = 1000
  
	# Defines a MSE loss function  
	x_train = np.arange(0,10)  
	y_train = x_train * 2 + 0.4 # y = 2x + 0.4 
   
	x_train_tensor = torch.from_numpy(x_train).float().to(device)  
	y_train_tensor = torch.from_numpy(y_train).float().to(device) 
   
	loss_fn = torch.nn.MSELoss(reduction='mean')  
	model = SimpleModel().to(device)  
	model.train() # set the model in train mode 5  

	# optimizer = torch.optim.SGD([w, b], lr=lr)  
	optimizer = torch.optim.SGD(model.parameters(), lr=lr) 
   
	for epoch in range(n_epochs):  
		#aout = w * x_train_tensor + b  
		aout = model(x_train_tensor)  
		loss = loss_fn(y_train_tensor, aout)  
		loss.backward() # compute gradients   
		optimizer.step() # update weights, biases  
		optimizer.zero_grad() # clear gradients  
	#print(w)  
	#print(b)  
	print(model.state_dict()) # model's weights and parameters  

	#--------------DataLoader version-------------------  
	x_train = np.arange(0,10)  
	y_train = x_train * 2 + 0.4 # y = 2x + 0.4  

	# without.to(device), it is a CPU Tensor  
	x_train_tensor = torch.from_numpy(x_train).float()  
	y_train_tensor = torch.from_numpy(y_train).float()  
	mydataset = MyDataSet(x_train_tensor, y_train_tensor)  
	train_dataset, val_dataset = random_split(mydataset, [8, 2])  
	train_loader = DataLoader(dataset=train_dataset, batch_size=4)  
	val_loader = DataLoader(dataset=val_dataset, batch_size=2)  
	print(train_dataset[0])
  
	#train_loader = DataLoader(dataset=train_dataset, batch_size=4, shuffle=True)    
	losses = []  
	val_losses = []  
	lr = 1e-2  
	n_epochs = 100  
	loss_fn = torch.nn.MSELoss(reduction='mean')  
	model = SimpleModel().to(device)  
	model.train() 
	# set the model in train mode  
	# optimizer = torch.optim.SGD([w, b], lr=lr)  
	optimizer = torch.optim.SGD(model.parameters(), lr=lr) 
   
	for epoch in range(n_epochs):  
		for x_batch, y_batch in train_loader:  
			x_batch = x_batch.to(device) 
			# load data in GPU  y_batch = y_batch.to(device)  
			aout = model(x_batch)  
			loss = loss_fn(y_batch, aout)  
			loss.backward() 
			# compute gradients   
			optimizer.step() 
			# update weights, biases  optimizer.zero_grad() 
			# clear gradients  
			losses.append(loss)  
	
		with torch.no_grad(): # turn of gradient calculation  
		
			for x_val, y_val in val_loader:  
				x_val = x_val.to(device)  
				y_val = y_val.to(device)  
				model.eval() # set model to evaluation mode 6  
				aout = model(x_val)  
				val_loss = loss_fn(y_val, aout)  
				val_losses.append(val_loss.item())  
				print('epoch' + str(epoch) + ' validation loss = ' + str(val_loss))  
	print(model.state_dict())   

if __name__ == "__main__":  
	sys.exit(int(main() or 0)) 

If you run the program, the output appears as:






Learning Regression:

If the predicted output from a network is expected to be a numeric value indicating some output value (as opposed to indicating the class of the output), the problem is referred to as regression problem. Suppose, we wanted to approximate a Sine function from –pi to pi by a third degree polynomial as:


y = sin(x)

approximate with:

y = ax3 + bx2 + cx + d


The goal is to determine the appropriate values of a, b, c and d.

The inputs to the system are x, x2 , and x3


Create a new Python application called FunctionApproximation.py. To be a able to visualize the approximation of the sine wave by the polynomial, we will add a plotting function to the project. Add a file called Utils.py with the following code in it.


from numpy import *
import matplotlib.pyplot as plt
def plot_predicted_vs_actual(ypred, yactual):
	mean_abs_error = sum(abs(ypred-yactual))/len(ypred)
	step_size = 20 # plot every 20th point
	a_pred = [ypred[i] for i in range(0,len(ypred)) if i%step_size==0]
	b_actual = [yactual[i] for i in range(0,len(ypred)) if i%step_size==0]
	t = linspace(0, len(a_pred), len(a_pred))
	plt.plot(t,a_pred, 'red',linestyle='dashed', label='predicted')
	plt.plot(t,b_actual,'blue', label='actual')
	plt.scatter(t,a_pred,marker='o',s=10, color='red', label='predicted')
	plt.scatter(t,b_actual,marker='o',s=10, color='blue', label='actual')
	plt.legend()
	plt.title('mean absolute error = '+ str(mean_abs_error))
	plt.show()

Add a file to the project called FunctionApprox.py with the following code in it. The first version uses just Numpy to determine the a, b, c, and d values. Note that we are computing the gradients of the loss with respect to each variable ourselves. For example the gradient of y_pred with respect to a is x**3.


import sys
import numpy as np
import math
import Utils
def main():
	# Create random input and output data
	x = np.linspace(-math.pi, math.pi, 2000)
	y = np.sin(x)
	# randomly initialize weights
	a = np.random.randn()
	b = np.random.randn()
	c = np.random.randn()
	d = np.random.randn()
	learning_rate = 1e-6
	for t in range(2000):
		# Forward pass: compute predicted y
		# y = a x^3+ b x^2 + c x + d
		y_pred = a * x** 3 + b * x**2 + c * x + d 
		# Compute and print loss
		loss = np.square(y_pred - y).sum()
			if t % 100 == 99:
				print(t, loss)
		# Backprop to compute gradients of a, b, c, d with respect to loss
		grad_y_pred = 2.0 * (y_pred - y)
		grad_d = grad_y_pred.sum()
		grad_c = (grad_y_pred * x).sum()
		grad_b = (grad_y_pred * x ** 2).sum()
		grad_a = (grad_y_pred * x ** 3).sum()
		# Update weights
		a -= learning_rate * grad_a
		b -= learning_rate * grad_b
		c -= learning_rate * grad_c
		d -= learning_rate * grad_d
	print(f'Result: y = {a} x^3+ {b} x^2 + {c} x + {d}')
 
	y_pred = a * x** 3 + b * x**2 + c * x + d 
	Utils.plot_predicted_vs_actual(y_pred, y)
if __name__ == "__main__":
 sys.exit(int(main() or 0))

If you run the program, the output will appear as: