What is K- Nearest Neighbors Algorithms In Machine Learning | Realcode4you

realcode4you
Jan 15, 2023
3 min read

Before learn about K-Nearest Neighbors first we know about supervised and unsupervised machine learning algorithms.

Un-Supervised Learning

Organize a collection of unlabeled data items into categories .

The instances are unlabelled and the goal is to organize a collection of data items into categories,
The items within a category are more similar to each other than they are to items in the other categories.

Clustering is also good approach for anomaly detection.

Example: K-means

Supervised Learning

Predict the relationship between objects and class-labels (Hypothesis)

Each object is labeled with a class.
The target is to find the predictive relationship between objects and class-labels. (Hypothesis)

Example:

K-NN (K- Nearest Neighbor
Decision Trees (Id3, C4.5)
SVM (Support Vector Machines)
ANN (Artificial Neural Network)
NB (Naive Bayes)

K-Nearest-Neighbors Algorithm

K nearest neighbors (KNN) is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (distance function)
KNN has been used in statistical estimation and pattern recognition since 1970’s
A case is classified by a majority voting of its neighbors, with the case being assigned to the class most common among its K nearest neighbors measured by a distance function.
If K=1, then the case is simply assigned to the class of its nearest neighbor

Features

All instances correspond to points in an n-dimensional Euclidean space
Classification is delayed till a new instance arrives
Classification done by comparing feature vectors of the different points
Target function may be discrete or real-valued

-Instance based learning algorithm

- Lazy learner: needs more computation time during

classification

- Conceptually close to human intuition: e.g., people with

similar income would live in the same neighborhood

Classification strategy:

K-NN assigns the instance to relative class group by identifying the most frequent class label.

In some case when numeric instances are involved proximity distance measures is required. E.g., Euclidean Distance

KNN Example

Similarity metric: Number of matching attributes (k=2)

Selecting the Number of Neighbors

-Increase k:

Makes KNN less sensitive to noise

- Decrease k:

Allows capturing finer structure of space

- Pick k not too large, but not too small (depends on data)

Advantages and Disadvantages of KNN

1. Need distance/similarity measure and attributes that “match” target function.

2. For large training sets,

Must make a pass through the entire dataset for each classification. This can be prohibitive for large data sets.

3. Prediction accuracy can quickly degrade when number of attributes grows.

Using K-NN in R

Case study: Iris data set

Load your data

df <- data(iris) 

# look into data structure 
head(iris) 
str(iris)
dim(iris)

Generate a random sample of all data

# Generate a random sample of all data
# in this case 82% of the  dataset.
randSelection <- sample(1:nrow(iris), 0.82 * nrow(iris)) 
randSelection

Normalization

# data normalization f
normalization <-function(x) { (x -min(x))/(max(x)-min(x))  }

# Run nomalization on on coulumns which are the predictors

irisNormalized <- as.data.frame(lapply(iris[,c(1:4)], normalization))

summary(irisNormalized)

Training & Testing

## seperate data into training and testing to #check model accuracy
#  get training data
training <- irisNormalized[randSelection,] 
nrow(training)

# get testing data
testing <- irisNormalized[-randSelection,] 
nrow(testing)

Obtain the class label

# obtain the class label of train dataset because as it will 
#be used as argument in knn classifier
targertClass <- iris[randSelection,5]
targertClass
summary(targertClass)

# extract 5th column if test dataset to measure the 
#accuracy
testClass <- iris[-randSelection,5]
summary(testClass)

Install package class for k-nn & Build the model

library(class)
# building the model for classification
# run knn classifier
# here we use k = 10

classificationModel <-  knn(training,testing,cl=targertClass,k=10)
classificationModel

Confusion matrix

#create confusion matrix to check model 
# performance 
ConfMatrix <- table(classificationModel,testClass)
ConfMatrix

OUTPUT:

Model Accuracy

#Calculate model accuracy 
modelAccuracy <- function(x){sum(diag(x)/(sum(rowSums(x)))) * 100}
modelAccuracy(ConfMatrix)

To get help in K- Nearest Neighbors Algorithms or other machine learning algorithms you can contact us or directly send your assignment requirement details at:

realcode4you@gmail.com

RealCode4You

What is K- Nearest Neighbors Algorithms In Machine Learning | Realcode4you

Recent Posts

Comments