Part 11 - Implementing K-Means Clustering in Python
Machine Learning Algorithms - Unsupervised Learning with scikit-learn
This article explains how to implement the K-Means clustering algorithm in Python. It covers importing necessary libraries, preparing data, initializing and fitting the model, retrieving cluster centers and labels, and printing the results.
Introduction to K-Means Clustering
K-Means clustering is an unsupervised learning algorithm that partitions data into K clusters. Each cluster is defined by its centroid, and each data point is assigned to the nearest cluster. The algorithm iteratively adjusts centroids to minimize the variance within each cluster.
Step-by-Step Implementation
Importing Libraries:
Import the
KMeans
class fromsklearn.cluster
.Import
numpy
for numerical operations.
Preparing Data:
Create a NumPy array
X
representing the data points in 2D space. Each sublist represents a data point with X and Y coordinates.
Initializing and Fitting the Model:
Initialize a K-Means clustering model with the number of clusters and a random state for reproducibility. For example,
KMeans(n_clusters=2, random_state=42)
initializes the model to partition the data into two clusters.Fit the K-Means model to the data
X
. During this process, the algorithm assigns data points to clusters by iteratively updating the positions of the centroids until convergence.
Retrieving Cluster Centers and Labels:
Retrieve the coordinates of the cluster centers (centroids) using
kmeans.cluster_centers_
.Retrieve the labels assigned to each data point using
kmeans.labels_
. Each label represents the cluster (0 or 1) to which the point belongs.
Printing the Results:
Print the cluster centers (centroids) and the assigned cluster labels for each data point.
Complete Code Example
# Import necessary libraries
from sklearn.cluster import KMeans
import numpy as np
# Prepare data
X = np.array([[,], [,], [,], [,], [,], [,]])
# Initialize the K-Means clustering model
kmeans = KMeans(n_clusters=2, random_state=42)
# Fit the model to the data
kmeans.fit(X)
# Get the cluster centers and labels
centroids = kmeans.cluster_centers_
labels = kmeans.labels_
# Print the results
print("Cluster Centers:\n", centroids)
print("Labels:", labels)
Conclusion
This article demonstrates how to use K-Means clustering to group data points in 2D space into two clusters. After fitting the model, it outputs the centroids of each cluster and the assigned cluster labels for each data point, showing which points belong to which cluster.