Machine Learning For Biometric Recognition | Concepts and Technologies of Artificial Intelligence

realcode4you
Mar 3, 2023
5 min read

Task

You will develop a Machine Learning (ML) solution to solve a biometric recognition task, capable of providing the highest recognition accuracy. The facial images are taken from real subjects in slightly different conditions, so that the images can be misclassified, that makes the ideal 100% accurate recognition difficult or even impossible. You will design a ML solution providing the minimal recognition errors.

Method and Technology

To achieve the minimum error, you will use ML techniques such as Artificial Neural Networks (ANNs) which can be implemented by using a new powerful programming platform Google Colab supporting languages related to ML. Alternatively advanced students can use other programming platforms using programming languages such as Python, MATLAB, or R. Advanced students can also be interested in a high performance ML technique such as Deep Learning, Convolutional Networks, and/or Gradient Boosting, demanded on the market. The Google Colab is a recommended platform, however advanced you can use other Integrated Development Environments eg Spyder.

Project Data and Scripts

The project biometric data include facial images of 30 persons. Each person is represented by 50 images taken under different conditions. When students use Colab, the data zip file has to be uploaded to your Google drive root. The project scripts process_yale_images and classify_yale have to be uploaded to your Colab project.

Alternatively you can use other benchmark data available in the Kaggle subject area. For example students could be interested in early detection of bone pathologies in X-Ray images which are available in a paper recently published in Scientific Reports.

Implementation

Import libraries

import os
from keras.models import Sequential
from keras.layers import  Dense, Activation, Dropout, Conv2D, Flatten, MaxPooling2D

#use the file path where your dataset is stored
data_path = "data\\Tr0"

from os import listdir # loads a library to work with directories 
fls = listdir(data_path) # creates a list of all image files
n = len(fls) # the number of the image files  
print('Number of images %i' % n)

from matplotlib import image # loads a library to work with images
from matplotlib import pyplot # loads a library to plot images
im1 = image.imread(data_path + '/' + fls[0]) # chooses 1st image from the image list
print(im1.shape) # prints the size in pixels of the chosen image 
pyplot.imshow(im1, cmap=pyplot.cm.gray) # displays the image
pyplot.show()

Output:

(77, 68)

import numpy as np  # loads a library for working with matrixes 
m=im1.shape[0]*im1.shape[1]  # m = h*w = (77*68=5236) is the number of pixels in images
images_data = np.zeros((n, m)) # creates a nxm-matrix of the images
images_target = np.zeros((n,))  # creates a n*1 matrix of targets which are the person labels 1 tp 30
# loops over all n=1500 images 
for i in range(0, n):
  filename = fls[i]  # loads a name of the image file
  img = image.imread(data_path + '/' + filename) # loads the image name
  images_data[i,:] = np.ravel(img) # vectorisation of the image 
  c = int(filename[5:7])  # extracts the class label from the file name
  images_target[i] = c  # assigns the target 
  if i % 10 == 0:
    print('> loaded %s %s %s' % (i, filename, c)) # prints the image name

from numpy import asarray # loads a library for saving matrices
from numpy import save
# save as a .npy file
fn = (path + '/' + 'yaleExtB_data.npy') # creates the file name for the image data
save(fn, images_data) 
fn = (path + '/' + 'yaleExtB_target.npy') # creates the file name for the targets
save(fn, images_target)

ANN Development

# Data load
import numpy as np
data = np.load('data\\yaleExtB_data.npy')
target = np.load('data\\yaleExtB_target.npy')

print(data.shape)
print(target.shape)

output:

(1500, 5236) (1500,)

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.decomposition import PCA
from sklearn.neural_network import MLPClassifier

# split into a training and testing set 
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size = 0.2)

# PCA
nof_prin_component  = 200 # parameter optimization in experiments
pca = PCA(n_components=nof_prin_component,whiten=True).fit(X_train)
## Applies PCA to the train and test images to calculate the principle components
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

# Train a neural network 
nohn = 200 ## number of hidden neurons
print("fitting the classifier to the train set")
clf = MLPClassifier(hidden_layer_sizes=(nohn),solver='sgd', activation='tanh', batch_size=256,verbose=True,early_stopping=True ).fit(X_train_pca,y_train)

output:

fitting the classifier to the train set

Iteration 1, loss = 3.75913680

Validation score: 0.016667

Iteration 2, loss = 3.73675458

Validation score: 0.016667

Iteration 3, loss = 3.70265436

Validation score: 0.016667

Iteration 4, loss = 3.66182735

Validation score: 0.025000

Iteration 5, loss = 3.61664907

...

y_pred = clf.predict(X_test_pca) # recognize the test image
print(classification_report(y_test,y_pred)) # Recognize the accuracy

output:

III Solution Increase number of parameter 400 and neurons 2000

# PCA  nof_prin_component 400
nof_prin_component  = 400 # parameter optimization in experiments
pca = PCA(n_components=nof_prin_component,whiten=True).fit(X_train)
## Applies PCA to the train and test images to calculate the principle components
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

# Train a neural network 
nohn = 2000 ## number of hidden neurons
print("fitting the classifier to the train set")
clf = MLPClassifier(hidden_layer_sizes=(nohn),solver='adam', activation='relu', batch_size=512,verbose=True,early_stopping=False).fit(X_train_pca,y_train)

output:

fitting the classifier to the train set

Iteration 1, loss = 3.39286550

Iteration 2, loss = 1.90792329

Iteration 3, loss = 0.92820411

Iteration 4, loss = 0.37812898

Iteration 5, loss = 0.15449320

Iteration 6, loss = 0.07245658

Iteration 7, loss = 0.03933379

Iteration 8, loss = 0.02367513

...

y_pred = clf.predict(X_test_pca) # recognize the test image
print(classification_report(y_test,y_pred)) # Recognize the accuracy

output:

Split data test size is 10%

# split into a training and testing set 
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size = 0.01)

# PCA  nof_prin_component 400
nof_prin_component  = 400 # parameter optimization in experiments
pca = PCA(n_components=nof_prin_component,whiten=True).fit(X_train)
## Applies PCA to the train and test images to calculate the principle components
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

# Train a neural network 
nohn = 2000 ## number of hidden neurons
print("fitting the classifier to the train set")
clf = MLPClassifier(hidden_layer_sizes=(nohn),solver='adam', activation='relu', batch_size=512,verbose=True,early_stopping=False).fit(X_train_pca,y_train)

output:

fitting the classifier to the train set

Iteration 1, loss = 3.29942160

Iteration 2, loss = 1.81556445

Iteration 3, loss = 0.81590870

Iteration 4, loss = 0.30386142

Iteration 5, loss = 0.11853102

Iteration 6, loss = 0.05518821

Iteration 7, loss = 0.03041754

Iteration 8, loss = 0.01880189

...

y_pred = clf.predict(X_test_pca) # recognize the test image
print(classification_report(y_test,y_pred)) # Recognize the accuracy

output:

Second solution

# split into a training and testing set 
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size = 0.2)

# Train a neural network 
nohn = 500 ## number of hidden neurons
print("fitting the classifier to the train set")
clf = MLPClassifier(hidden_layer_sizes=(nohn),solver='sgd', activation='tanh', batch_size=256,verbose=True,early_stopping=False ).fit(X_train_pca,y_train)

output:

fitting the classifier to the train set

Iteration 1, loss = 3.64457916

Iteration 2, loss = 3.63758783

Iteration 3, loss = 3.62717654

Iteration 4, loss = 3.61488782

Iteration 5, loss = 3.60121313

Iteration 6, loss = 3.58709699

...

y_pred = clf.predict(X_test_pca) # recognize the test image
print(classification_report(y_test,y_pred)) # Recognize the accuracy

output:

Keeping all data changing only the solver from 'sgd' to 'adam'

# Train a neural network 
nohn = 200 ## number of hidden neurons
print("fitting the classifier to the train set")
clf = MLPClassifier(hidden_layer_sizes=(nohn),solver='adam', activation='tanh', batch_size=256,verbose=True,early_stopping=False ).fit(X_train_pca,y_train)

output:

fitting the classifier to the train set

Iteration 1, loss = 3.74825815

Iteration 2, loss = 3.50373783

Iteration 3, loss = 3.31563171

Iteration 4, loss = 3.14246663

Iteration 5, loss = 2.98634438

Iteration 6, loss = 2.83988810

Iteration 7, loss = 2.70424178

Iteration 8, loss = 2.57771883

...

y_pred = clf.predict(X_test_pca) # recognize the test image
print(classification_report(y_test,y_pred)) # Recognize the accuracy

output:

RealCode4You

Machine Learning For Biometric Recognition | Concepts and Technologies of Artificial Intelligence

ANN Development

Split data test size is 10%

Second solution

Keeping all data changing only the solver from 'sgd' to 'adam'

Recent Posts

Comments