k-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed apriori. It is used to define the k centers for one or each clusters which is used in our algorithms. The way to put the centres are different so use techniques to place these far away from each other .

Basically it is used to partition x data points into the set of k clusters where each data point is assigned to its closest cluster.

It is used to define two variants:

1.Class: It is used to fit the method to learn clusters on train data.

2.Function: for given train data, used to return an array of integer for different clusters.

For the class, the labels over the training data can be found in the labels_ attribute.

K-means is often referred to as Lloydâ€™s algorithm.

### Steps to do this:

In basic terms, the algorithm has three steps.

The first step chooses the initial centroids, with the most basic method being to choose k samples from the dataset X.

The second step creates new centroids by taking the mean value of all of the samples assigned to each previous centroid.

And third, the difference between the old and the new centroids are computed and the algorithm repeats these last two steps until this value is less than a threshold. In other words, it repeats until the centroids do not move significantly.

### Example

import matplotlib.pyplot as plt

%matplotlib inline

import numpy as np

from sklearn.cluster import Kmeans

X = np.array([[5,3], [10,15], [15,12], [24,10], [30,45], [85,70], [71,80], [60,78], [55,52], [80,91],])

#data visualization

plt.scatter(X[:,0],X[:,1], label='True Position')

#Creating Clusters

kmeans = KMeans(n_clusters=2)

kmeans.fit(X)

print(kmeans.cluster_centers_)

### Output show in 2D form:

[[ 16.8 17. ]

[ 70.2 74.2]]

#show the data levels

print(kmeans.labels_)

[0 0 0 0 0 1 1 1 1 1]

plt.scatter(X[:,0],X[:,1], c=kmeans.labels_, cmap='rainbow')

### Now let's plot the points along with the centroid coordinates:

plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')

plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')

### MiniBatchKMeans

The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to reduce the computation time, while still attempting to optimise the same objective function. Mini-batches are subsets of the input data, randomly sampled in each training iteration.

In contrast to other algorithms that reduce the convergence time of k-means, mini-batch k-means produces results that are generally only slightly worse than the standard algorithm.

MiniBatchKMeans converges faster than KMeans, but the quality of the results is reduced. In practice this difference in quality can be quite small, as shown in the example and cited reference.

Example:

# Load libraries

from sklearn import datasets

from sklearn.preprocessing import StandardScaler

from sklearn.cluster import MiniBatchKMeans

# Load data

iris = datasets.load_iris()

X = iris.data

# Standarize features

scaler = StandardScaler()

X_std = scaler.fit_transform(X)

MiniBatchKMeans works similarly to KMeans, with one significance difference: the batch_size parameter.

# Create k-mean object

clustering = MiniBatchKMeans(n_clusters=3, random_state=0, batch_size=100)

# Train model

model = clustering.fit(X_std)

model.cluster_centers_

### Affinity Propagation

In this creating clusters by sending messages between pairs of samples until convergence. A dataset is then described using a small number of exemplars, which are identified as those most representative of other samples. The messages sent between pairs represent the suitability for one sample to be the exemplar of the other, which is updated in response to the values from other pairs.

Example:

Below is the Python implementation of the Affinity Propagation clustering using scikit-learn library:

#import all the libraries

from sklearn.cluster import AffinityPropagation

from sklearn import metrics

from sklearn.datasets.samples_generator import make_blobs

# Generate sample data

centers = [[1, 1], [-1, -1], [1, -1], [-1, -1]]

X, labels_true = make_blobs(n_samples = 400, centers = centers,

cluster_std = 0.5, random_state = 0)

# Compute Affinity Propagation

af = AffinityPropagation(preference =-50).fit(X)

cluster_centers_indices = af.cluster_centers_indices_

labels = af.labels_

n_clusters_ = len(cluster_centers_indices)

## Comments