Part 16 - Implementing t-Distributed Stochastic Neighbor Embedding (t-SNE) in Python

Machine Learning Algorithms Series - Visualizing High-Dimensional Data with scikit-learn

Feb 06, 2025

This article explains how to implement t-Distributed Stochastic Neighbor Embedding (t-SNE) in Python. t-SNE is a dimensionality reduction technique primarily used for visualizing high-dimensional data in 2D or 3D space. Unlike PCA, t-SNE is nonlinear and focuses on preserving the local structure of data, making it highly effective for visualizing clusters. It covers importing necessary libraries, preparing data, initializing and fitting the model, transforming the data, and printing the results. Note that it is computationally intensive and best suited for small to medium-size datasets.

Step-by-Step Implementation

Importing Libraries:
- Import the TSNE class from sklearn.manifold.
- Import numpy for numerical operations.
Preparing Data:
- Create a NumPy array X representing the data points in a high-dimensional space (e.g., 3D). Each sublist represents a data point with X, Y, and Z coordinates.
Initializing and Fitting the Model:
- Initialize the t-SNE model with the number of components (dimensions) to reduce to and a random state for reproducibility. It is also important to set the perplexity parameter. For example, TSNE(n_components=2, random_state=42, perplexity=5) initializes the model to reduce the data to 2D, sets a random seed, and sets the perplexity value.
- Fit the t-SNE model to the data X and transform it into a lower-dimensional space using tsne.fit_transform(X). This both fits the model and applies the transformation, generating the 2D coordinates for each data point.
Printing the Results:
- Print the transformed data in the reduced 2D space. Each row in the reduced data represents a data point in 2D space, where the values correspond to the new coordinates derived through the t-SNE transformation.

Complete Code Example

# Import necessary libraries
from sklearn.manifold import TSNE
import numpy as np

# Prepare data
X = np.array([[, ,], [, ,], [, ,], [, ,], [, ,], [, ,], [, ,]])

# Initialize the t-SNE model
tsne = TSNE(n_components=2, random_state=42, perplexity=5)

# Fit the model to the data and transform it
X_reduced = tsne.fit_transform(X)

# Print the results
print("Reduced data:\n", X_reduced)

Key Considerations

Perplexity: This parameter controls the balance between local and global aspects of the data in the embedding. It is generally recommended to set it between 5 and 50. You may need to adjust the perplexity to optimize the t-SNE output.
Variance: Unlike PCA, t-SNE does not capture variance but instead focuses on preserving local structure.

Get started with the Structured Learning

Conclusion

This article demonstrates how to use t-SNE to reduce a dataset from a higher dimension (e.g., 3D) to 2D for visualization purposes. T-SNE is particularly effective at creating visually interpretable representations of complex, high-dimensional data by clustering similar points close together, revealing patterns that may not be apparent in higher dimensions.

School of AI | Newsletter

Discussion about this post