K Nearest Neighbor(KNN) In Machine Learning | Machine Learning Homework Help | Realcode4you

KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry.

It is a simplest Machine Learning algorithms based on Supervised Learning technique.

K-NN algorithm stores all the available data and classifies a new data point based on the similarity.

This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.

Steps to do it:

Step-1: Select the number K of the neighbours

Step-2: Calculate the Euclidean distance of K number of neighbours

Step-3: Take the K nearest neighbours as per the calculated Euclidean distance.

Step-4: Among these k neighbours, count the number of the data points in each category.

Step-5: Assign the new data points to that category for which the number of the neighbour is maximum.

Step-6: Our model is ready.

“New data point” is a point, which is used to find the categories in which this data point is belongs.

To find categories which is satisfied this point calculated by Euclidian distance formula.

Euclidian distance formula:

Example: Using sklearn

#importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
 
#Reading data
path = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
 
#Assign index name
#INDEX 
col_names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
 
#Creating pandas dataframe
dataset = pd.read_csv(path, names = col_names)
dataset.head()

Selecting target column:

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values

#Split data
#taking 60% training data and 40%testing data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)

#using standard scaler to read categorical value
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

After splitting the dataset into training and test dataset, we will instantiate k-nearest classifier. Here we are using ‘k =8’, you may vary the value of k and notice the change in result.

#fit into the model
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 8)

Next, we fit the train data by using ‘fit’ function

classifier.fit(X_train, y_train)

Output:

Predicting the test data:

y_pred = classifier.predict(X_test)

Finding Score with confusion matrix:

Another method to determine optimal K in KNN:

# loading library
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cross_validation import cross_val_score
from sklearn.metrics import f1_score

# setting only 5 neightbors values to reduce running time you can test it for 50 values
neighbors = list(range(1,6))

# empty list that will hold cv scores
cv_scores = []

# perform 10-fold cross validation
for k in neighbors:
    # instantiate learning model 
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn.fit(X_train, y_train), X_train, y_train, cv=10,scoring='accuracy')
    cv_scores.append(scores.mean())

# changing to misclassification error
import matplotlib.pyplot as plt
MSE = [1 - x for x in cv_scores]

%matplotlib inline

# determining best k
optimal_k = neighbors[MSE.index(min(MSE))]
print("The optimal number of neighbors is %d" % optimal_k)

# plot misclassification error vs k
plt.plot(neighbors, MSE)
plt.xlabel('Number of Neighbors K')
plt.ylabel('Misclassification Error')
plt.show()

Finding F score using k-cross validation:

# perform F1-score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cross_validation import cross_val_score
from sklearn.metrics import f1_score
f1_scores = []

# setting 50 neightbors valus
neighbors = list(range(1,5))
for k in neighbors:
    # instantiate learning model 
    knn = KNeighborsClassifier(n_neighbors=k)
    
    # fitting the model
    knn.fit(X_train, y_train)

    # predict the response
    y_pred = knn.predict(X_test)
    
    # f1_score based on k
    f1_scores.append(f1_score(y_test, y_pred, average='micro'))

print(f1_scores)

2 Comments

Commenting on this post isn't available anymore. Contact the site owner for more info.

Jofrey

Oct 07

Hey! My study group wanted to try something fun online after finishing a big project. Looked for a site with engaging activities we could all enjoy. Came across spinmama, which has a sleek interface and tons of fun options. They provide generous welcome bonuses, free spins, daily deals, and loyalty rewards that make every session rewarding. The site is packed with activities, and the regular promotions and cashback keep things lively. It was perfect for our group, and we had a blast.