Part 13 - Implementing DBSCAN in Python

Machine Learning Algorithms Series - Unsupervised Learning with scikit-learn

Feb 06, 2025

This article explains how to implement the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm in Python. It covers importing necessary libraries, preparing data, initializing and fitting the model, retrieving cluster labels, and printing the results.

Introduction to DBSCAN

DBSCAN is an unsupervised clustering algorithm that groups data points based on density, making it particularly effective for identifying clusters of arbitrary shapes and for handling noise (outliers). DBSCAN requires two parameters:

eps: The maximum distance between two points to be considered neighbors.
min_samples: The minimum number of points required to form a dense region.

Step-by-Step Implementation

Importing Libraries:
- Import the DBSCAN class from sklearn.cluster.
- Import numpy for numerical operations.
Preparing Data:
- Create a NumPy array X representing the data points in 2D space. Each sublist represents a data point with X and Y coordinates.
Initializing and Fitting the Model:
- Initialize a DBSCAN clustering model with the eps and min_samples parameters. For example, DBSCAN(eps=3, min_samples=2) initializes the model with a maximum distance of 3 and a minimum of 2 samples to form a dense region.
- Fit the DBSCAN model to the data X. During this process, the algorithm classifies each point as either a core point, a border point, or noise. The algorithm does not require specifying the number of clusters in advance as it finds clusters based on density.
Retrieving Cluster Labels:
- Retrieve the labels assigned to each data point using dbscan.labels_. Each label represents the cluster to which the point belongs. Points that are part of a cluster are assigned a positive integer label, while points considered noise are assigned a label of -1.
Printing the Results:
- Print the assigned cluster labels for each data point.

Complete Code Example

# Import necessary libraries
from sklearn.cluster import DBSCAN
import numpy as np

# Prepare data
X = np.array([[,], [,], [,], [,], [,], [,]])

# Initialize the DBSCAN clustering model
dbscan = DBSCAN(eps=3, min_samples=2)

# Fit the model to the data
dbscan.fit(X)

# Get the cluster labels
labels = dbscan.labels_

# Print the results
print("Labels:", labels)

Get started with the Structured Learning

Conclusion

This article demonstrates how to use DBSCAN to cluster points in 2D space. DBSCAN is useful for identifying clusters of arbitrary shape and handling noise, which is represented by points with the label -1. The algorithm groups points that have a high density and identifies sparse, isolated points as noise.

School of AI | Newsletter

Discussion about this post