Before learn about K-Nearest Neighbors first we know about supervised and unsupervised machine learning algorithms.

**Un-Supervised Learning**

Organize a collection of unlabeled data items into categories .

The instances are unlabelled and the goal is to organize a collection of data items into categories,

The items within a category are more similar to each other than they are to items in the other categories.

Clustering is also good approach for anomaly detection.

Example: K-means

**Supervised Learning**

Predict the relationship between objects and class-labels (Hypothesis)

Each object is labeled with a class.

The target is to find the predictive relationship between objects and class-labels. (Hypothesis)

Example:

*K*-NN (*K*- Nearest NeighborDecision Trees (Id3, C4.5)

SVM (Support Vector Machines)

ANN (Artificial Neural Network)

NB (Naive Bayes)

**K-Nearest-Neighbors Algorithm**

K nearest neighbors (KNN) is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (distance function)

KNN has been used in statistical estimation and pattern recognition since 1970’s

A case is classified by a majority voting of its neighbors, with the case being assigned to the class most common among its K nearest neighbors measured by a distance function.

If K=1, then the case is simply assigned to the class of its nearest neighbor

**Features**

All instances correspond to points in an n-dimensional Euclidean space

Classification is delayed till a new instance arrives

Classification done by comparing feature vectors of the different points

Target function may be discrete or real-valued

**-Instance based learning algorithm**

- Lazy learner: needs more computation time during

classification

- Conceptually close to human intuition: e.g., people with

similar income would live in the same neighborhood

**Classification strategy: **

K-NN assigns the instance to relative class group by identifying the most frequent class label.

In some case when numeric instances are involved proximity distance measures is required. E.g., Euclidean Distance

**KNN Example**

Similarity metric: Number of matching attributes (k=2)

**Selecting the Number of Neighbors**

-Increase k:

Makes KNN less sensitive to noise

- Decrease k:

Allows capturing finer structure of space

- Pick k not too large, but not too small (depends on data)

**Advantages and Disadvantages of KNN**

1. Need distance/similarity measure and attributes that “match” target function.

2. For large training sets,

Must make a pass through the entire dataset for each classification. This can be prohibitive for large data sets.

3. Prediction accuracy can quickly degrade when number of attributes grows.

**Using K-NN in R**

** Case study: Iris data set**

**Load your data**

```
df <- data(iris)
# look into data structure
head(iris)
str(iris)
dim(iris)
```

**Generate a random sample of all data**

```
# Generate a random sample of all data
# in this case 82% of the dataset.
randSelection <- sample(1:nrow(iris), 0.82 * nrow(iris))
randSelection
```

**Normalization**

```
# data normalization f
normalization <-function(x) { (x -min(x))/(max(x)-min(x)) }
# Run nomalization on on coulumns which are the predictors
irisNormalized <- as.data.frame(lapply(iris[,c(1:4)], normalization))
summary(irisNormalized)
```

**Training & Testing**

```
## seperate data into training and testing to #check model accuracy
# get training data
training <- irisNormalized[randSelection,]
nrow(training)
# get testing data
testing <- irisNormalized[-randSelection,]
nrow(testing)
```

**Obtain the class label**

```
# obtain the class label of train dataset because as it will
#be used as argument in knn classifier
targertClass <- iris[randSelection,5]
targertClass
summary(targertClass)
# extract 5th column if test dataset to measure the
#accuracy
testClass <- iris[-randSelection,5]
summary(testClass)
```

**Install package class for k-nn & Build the model**

```
library(class)
# building the model for classification
# run knn classifier
# here we use k = 10
classificationModel <- knn(training,testing,cl=targertClass,k=10)
classificationModel
```

**Confusion matrix**

```
#create confusion matrix to check model
# performance
ConfMatrix <- table(classificationModel,testClass)
ConfMatrix
```

OUTPUT:

**Model Accuracy**

```
#Calculate model accuracy
modelAccuracy <- function(x){sum(diag(x)/(sum(rowSums(x)))) * 100}
modelAccuracy(ConfMatrix)
```

*To get help in K- Nearest Neighbors Algorithms or other machine learning algorithms you can contact us or directly send your assignment requirement details at:*

realcode4you@gmail.com

## Comments