# Machine Learning Assignment Help| Machine Learning Project Help: K-Mean Clustering

k-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed apriori. It is used to define the k centers for one or each clusters which is used in our algorithms. The way to put the centres are different so use techniques to place these far away from each other .

Basically it is used to partition x data points into the set of k clusters where each data point is assigned to its closest cluster.

It is used to define two variants:

**1.Class**: It is used to fit the method to learn clusters on train data.

**2.Function**: for given train data, used to return an array of integer for different clusters.

For the class, the labels over the training data can be found in the labels_ attribute.

K-means is often referred to as Lloyd’s algorithm.

__Steps to do this:__

In basic terms, the algorithm has three steps.

The first step chooses the initial centroids, with the most basic method being to choose k samples from the dataset X.

The second step creates new centroids by taking the mean value of all of the samples assigned to each previous centroid.

And third, the difference between the old and the new centroids are computed and the algorithm repeats these last two steps until this value is less than a threshold. In other words, it repeats until the centroids do not move significantly.

__Example__

import matplotlib.pyplot as plt

%matplotlib inline

import numpy as np

from sklearn.cluster import Kmeans

X = np.array([[5,3], [10,15], [15,12], [24,10], [30,45], [85,70], [71,80], [60,78], [55,52], [80,91],])

#data visualization

plt.scatter(X[:,0],X[:,1], label='True Position')

#Creating Clusters

kmeans = KMeans(n_clusters=2)

kmeans.fit(X)

print(kmeans.cluster_centers_)

__Output show in 2D form:__

[[ 16.8 17. ]

[ 70.2 74.2]]

#show the data levels

print(kmeans.labels_)

[0 0 0 0 0 1 1 1 1 1]

plt.scatter(X[:,0],X[:,1], c=kmeans.labels_, cmap='rainbow')

### Now let's plot the points along with the centroid coordinates:

plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')

plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')

__MiniBatchKMeans__

__MiniBatchKMeans__

The **MiniBatchKMeans** is a variant of the **KMeans** algorithm which uses mini-batches to reduce the computation time, while still attempting to optimise the same objective function. Mini-batches are subsets of the input data, randomly sampled in each training iteration.

In contrast to other algorithms that reduce the convergence time of k-means, mini-batch k-means produces results that are generally only slightly worse than the standard algorithm.

**MiniBatchKMeans** converges faster than **KMeans**, but the quality of the results is reduced. In practice this difference in quality can be quite small, as shown in the example and cited reference.

__Example:__

*# Load libraries*

**from** **sklearn** **import** datasets

**from** **sklearn.preprocessing** **import** StandardScaler

**from** **sklearn.cluster** **import** MiniBatchKMeans

*# Load data*

iris = datasets.load_iris()

X = iris.data

*# Standarize features*

scaler = StandardScaler()

X_std = scaler.fit_transform(X)

MiniBatchKMeans works similarly to KMeans, with one significance difference: the batch_size parameter.

*# Create k-mean object*

clustering = MiniBatchKMeans(n_clusters=**3**, random_state=**0**, batch_size=**100**)

*# Train model*

model = clustering.fit(X_std)

model.cluster_centers_

__Affinity Propagation__

**In this** creating clusters by sending messages between pairs of samples until convergence. A dataset is then described using a small number of exemplars, which are identified as those most representative of other samples. The messages sent between pairs represent the suitability for one sample to be the exemplar of the other, which is updated in response to the values from other pairs.

__Example:__

Below is the Python implementation of the Affinity Propagation clustering using scikit-learn library:

#import all the libraries

from sklearn.cluster import AffinityPropagation

from sklearn import metrics

from sklearn.datasets.samples_generator import make_blobs

# Generate sample data

centers = [[1, 1], [-1, -1], [1, -1], [-1, -1]]

X, labels_true = make_blobs(n_samples = 400, centers = centers,

cluster_std = 0.5, random_state = 0)

# Compute Affinity Propagation

af = AffinityPropagation(preference =-50).fit(X)

cluster_centers_indices = af.cluster_centers_indices_

labels = af.labels_

n_clusters_ = len(cluster_centers_indices)