Computer Vision Assignment Help | Practice Set 2

realcode4you
Mar 8, 2022
6 min read

Question 1

Write a function match_features(features1, features2, x1, y1, x2, y2, threshold) to implement the "ratio test" or "nearest neighbor distance ratio test" method of matching two sets of local features features1 at the locations (x1, y1) and features2 at the locations (x2, y2) as described in the lecture materials and in the chapter 7.1.3 of the 2nd edition of Szeliski's book.

The parameters features1 and features2 are numpy arrays of shape k*128 , each representing one set of features. x1 and x2 are two numpy arrays of shape k*1 respcectively containing the x-locations of features1 and features2 . y1 and y2 are two numpy arrays of shape respcectively containing the y-locations of features1 and features2 . Your function should return two outputs: matches and confidences , where matches is a numpy array of shape n*2 , where is the number of matches. The first column of matches is an index in features1 , and the second column is an index in features2 . confidences is a numpy array of shape n*1 with the real valued confidence for every match.

This function does not need to be symmetric (e.g. it can produce different numbers of matches depending on the order of the arguments). To start with, simply implement the "ratio test", equation 7.18 in section 7.1.3 of Szeliski. There are a lot of repetitive features in these images, and all of their descriptors will look similar.

Inputs

features1 is a 2 dimensional numpy array of data type float64 with shape m*d .
features2 is a 2 dimensional numpy array of data type float64 with shape n*d.
x1 is a 2 dimensional numpy array of data type float64 with shape m*1.
y1 is a 2 dimensional numpy array of data type float64 with shape m*1.
x2 is a 2 dimensional numpy array of data type float64 with shape n*1.
y2 is a 2 dimensional numpy array of data type float64 with shape n*1.
threshold is a real number of data type float64 .

Outputs

matches is a 2 dimensional numpy array of data type int64 .
confidences is a 1 dimensional numpy array of data type float64 .

Data

You can tune your algorithm on the images at data/notre_dame_1.jpg and data/notre_dame_2.jpg , and interest points at data/notre_dame_1_to_notre_dame_2.pkl and also on the images at data/mount_rushmore_1.jpg and data/mount_rushmore_2.jpg ,and interest points at data/mount_rushmore_1_to_mount_rushmore_2.pkl . Note that the corresponding points within the pickle files are the matching points.

#Feature matching
def match_features(features1, features2, x1, y1, x2, y2, threshold=1.0):
 #code here
 raise NotImplementedError()

Question 2

Write a function find_affine_transform(x1, y1, x2, y2) which will return the homogeneous affine transformation matrix from to , where and are the 2 dimensional corresponding/matching points from two different images. The technique for computing transformation matrix was covered in the lectures, which is an approximation of any generic affine transformation matrix and can be done with the help of homogeneous coordinate.

Inputs

x1 , y1 , x2 , y2 are 2 dimensional numpy arrays of shape of data type float64 .

Outputs

This function should return a 2 dimensional numpy array of shape of data type float64 .

Data

You can consider the matching points at data/notre_dame_1_to_notre_dame_2.pkl for tuning your algorithm

# Affine transformation
def find_affine_transform(x1, y1, x2, y2):
   # Write your code here
   # x1, y1, x2, y2 are numpy arrays of shape Nx1
    # T is a 3x3 numpy array
    T = np.zeros((3,3))
    T[0,0] = np.sum(x1*x2)
    T[0,1] = np.sum(x1*y2)
    T[0,2] = np.sum(x1)
    T[1,0] = np.sum(y1*x2)
    T[1,1] = np.sum(y1*y2)
    T[1,2] = np.sum(y1)
    T[2,0] = np.sum(x2)
    T[2,1] = np.sum(y2)
    T[2,2] = len(x1)
    return T
    #raise NotImplementedError()

Question 3

Write a function make_bovw_spatial_histogram(im, locations, clusters, division) to create bag of visual words representation of an image im whose features are located at locations and the quantized labels of those features are stored in clusters . You have to build the histogram based on the division information provided in division . For example, if division = [2, 3] , you have to imagine dividing the image along Y-axis in parts and along X-axis in parts (as shown in the right most figure below), else if division = [2, 2] , you have to imagine dividing the image in parts along both the axes, else if division = [1, 1] , you just compute the bag-of-visual-words histogram on the entire image without dividing into any parts.

Inputs

im is a 3 dimensional numpy array of data type uint8 .
locations is a 2 dimensional numpy array of shape of data type int64 , whose each row is a Cartesian coordinate .
clusters is a 1 dimensional numpy array of shape of data type int64 , whose each element indicates the quantized cluster id.
division is a list of integer of length 2.

Outputs

This function should return a 1 dimensional numpy array of data type int64 .

Data

There is no specific data for this question. However, you can create data on one of the images available inside the data folder.

There are four test cases which will call the above functionn to calculate bag-of-visual-words spatial histograms on the image im imagining its coarse and fine divisions which will be provided while calling the function. In each test case, your spatial histogram should be exactly matched with the correct spatial histogram. Coarser test cases contain lower weightage compared to their finer counter parts.

# Spatial bag of visual words histogram
def make_bovw_spatial_histogram(im, locations, clusters, division):
 #YOUR CODE HERE
 raise NotImplementedError()

import os
import numpy as np
from sklearn.svm import SVC
from sklearn import svm, datasets
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics.pairwise import chi2_kernel
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.utils.multiclass import unique_labels
from sklearn.model_selection import train_test_split
# opencv contrib version is needed because of sift (issues related to patent)
!pip3 install opencv-contrib-python==4.4.0.42
import cv2

def extractFeatures(kmeans, descriptor_list, image_count, no_clusters):
 	im_features = np.array([np.zeros(no_clusters) for i in  range(image_count)])
 	for i in range(image_count):
 	  for j in range(len(descriptor_list[i])):
 	    feature = descriptor_list[i][j]
 	    feature = feature.reshape(1, 128)
 	    idx = kmeans.predict(feature)
 	    im_features[i][idx] += 1
 	return im_features

Question 4

Write a function histogram_intersection_kernel(X, Y) to compute Histogram Intersection Kernel which is also known as the Min Kernel and is calculated by

Data

There is no specific data for this question. However, you can create your own data X and Y satisfying the input criteria.

# Histogram intersection kernel
def histogram_intersection_kernel(X, Y):
	# YOUR CODE HERE
	K = np.zeros((X.shape[0], Y.shape[0]), dtype=np.float64)
	for i, x in enumerate(X):
		for j, y in enumerate(Y):
			K[i, j] = np.sum(np.minimum(x, y))
 return K
 #raise NotImplementedError()

Question 5

Write a function generalized_histogram_intersection_kernel(X, Y, alpha) to compute Generalized Histogram Intersection Kernel which is computed by

Data

There is no specific data for this question. However, you can create your own data X and Y satisfying the input criteria

# Generalized histogram intersection kernel
def generalized_histogram_intersection_kernel(X, Y, alpha):
	# YOUR CODE HERE
	#return np.sum(np.minimum(np.abs(X)**alpha, np.abs(Y)**alpha), axis=1)
	#return np.dot(X, Y.T) * np.sum(np.minimum(np.sum(X, axis=1)[:, None], np.sum(Y, axis=1))[:, None], 
 raise NotImplementedError()

Question 6

Write a function train_gram_matrix(X_tr, X_te) which will compute the train gram matrix using the Histogram Intersection Kernel implemented above.

# Train gram matrix
def train_gram_matrix(X_tr, X_te):
	# YOUR CODE HERE
 raise NotImplementedError()

Question 7

Write a function test_gram_matrix(X_tr, X_te) which will compute the test gram matrix using the Histogram Intersection Kernel implemented above.

# Test gram matrix
def test_gram_matrix(X_tr, X_te):
 # YOUR CODE HERE
 raise NotImplementedError()

Question 8

# 3D reconstruction
def reconstruct_3d(p1, p2, R1, R2, T1, T2):
	# YOUR CODE HERE
	A = np.concatenate((R1, R2), axis=0)
	b = np.concatenate((np.transpose(p1 - T1), np.transpose(p2 - T2)), axis=0)
	p3D = np.transpose(np.matmul(np.linalg.pinv(A), b))
 return p3D
 #raise NotImplementedError()

Question 9

Write a function train_cnn(model, train_loader) to train the following version of the Residual Network (ResNet) model on the EXCV10 (https://empslocal.ex.ac.uk/people/staff/ad735/ECMM426/EXCV10.zip) (Exeter Computer Vision 10) dataset (available at this link (https://empslocal.ex.ac.uk/people/staff/ad735/ECMM426/EXCV10.zip)).

At the end of the training, this function should save the best weights of the trained CNN at: data/weights_resnet.pth . The EXCV10 (https://empslocal.ex.ac.uk/people/staff/ad735/ECMM426/EXCV10.zip) dataset contains 10000 images from 10 classes which are further split into train (available at train/ folder; total 8000 images with 800 images/class) and validation (available at val/ folder; total 2000 images with 200 images/class) sets. For training your model, please feel free to decide your optimal hyperparameters, such as the number of epochs, type of optimisers, learning rate scheduler etc within the function, which can be done to optimise the performance of the model on the validation set.

Inputs

model is an instantiation of ResNet class which can be created as follows: ResNet(block=BasicBlock, layers=[1, 1, 1], num_classes=num_classes) . An example of this can be found in the snippet in the following cell. train_loader is the training data loader. You can create the dataset and data loader for your training following the example in the cell below. Feel free to try other data augmentation and regularization techniques to train a better model.

Outputs

This function should not necessarily return any output, instead it should save your best model at data/weights_resnet.pth .

Data

You can train your model on the data available at https://empslocal.ex.ac.uk/people/staff/ad735/ECMM426/EXCV10.zip (https://empslocal.ex.ac.uk/people/staff/ad735/ECMM426/EXCV10.zip). As EXCV10 dataset is quite large in size, donot upload it with your submission.

Contact us to get help in computer vision related problems and project. Our expert has the deep knowledge in all Computer Vision related topics.