Part 24 - Convolutional Neural Networks (CNNs) for Image Classification

Machine Learning Algorithms - Implementing a CNN for Handwritten Digit Recognition in Python

Feb 06, 2025

Following our series on machine learning algorithms, this article introduces Convolutional Neural Networks (CNNs), a powerful type of deep learning model specifically designed for processing structured grid data like images. CNNs use convolutional layers to capture spatial hierarchies and features such as edges, textures, and shapes. This article will guide you through implementing a CNN for image classification, using the MNIST dataset of handwritten digits.

Understanding Convolutional Neural Networks

Deep Learning Models: CNNs are a class of deep learning models optimized for processing grid-like data.
Convolutional Layers: These layers apply filters to the input image, capturing spatial features.
Feature Extraction: CNNs automatically learn hierarchical features, from basic edges to complex shapes.
Applications: CNNs are widely used in computer vision tasks such as image classification, object detection, and image segmentation.

Step-by-Step Implementation

Import Libraries:
- tensorflow: The main library for building and training deep learning models.
- tensorflow.keras.layers: Modules for defining neural network layers and models.
- tensorflow.keras.datasets: Includes the MNIST dataset for handwritten digit images.
Load and Pre-process the MNIST Dataset:
- Load the MNIST dataset using mnist.load_data(), which splits the data into training and testing sets. The training set is used to teach the CNN to classify digits, and the testing set helps evaluate its performance on new data.
- Normalize pixel values by dividing them by 255.0 to scale pixel intensities from 0 to 255 to a range between 0 and 1.
- Reshape the training and test data to the correct input shape for the CNN, specifying the number of samples, image dimensions (28x28), and the grayscale channel.
Define the CNN Model:
- Create a sequential model using models.sequential(), where layers are stacked in a sequence.
- Add convolutional layers (layers.Conv2D) with specified filters, kernel size, activation function (ReLU), and input shape. For example: layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)). 32 is the number of filters, (3,3) is the kernel size, and ReLU is the activation function.
- Add max pooling layers (layers.MaxPooling2D) to reduce spatial dimensions and down-sample feature maps. For example: layers.MaxPooling2D((2, 2)).
- Add a flatten layer (layers.Flatten) to convert the 2D feature maps into a 1D vector for the fully connected layers.
- Add dense (fully connected) layers (layers.Dense) with ReLU activation, leading to the final output layer with softmax activation for multi-class classification. For example: layers.Dense(64, activation='relu') and layers.Dense(10, activation='softmax').
Compile the Model:
- Configure the model for training using model.compile().
- Specify the optimizer (Adam), loss function (sparse categorical cross-entropy), and metrics (accuracy).
Train the Model:
- Train the model using model.fit() with training data, epochs, batch size, and validation split. The validation split reserves a portion of the training data to evaluate model performance during training.
Evaluate the Model:
- Evaluate the model on the test data using model.evaluate() to calculate the test loss and accuracy.
- Print the test accuracy to assess how well the model generalizes to unseen data.

Complete Code Example:

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist

# Load and pre-process the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the CNN model
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f'Test accuracy: {test_accuracy}')

Get started with the Structured Learning

Conclusion

CNNs provide a powerful approach to image classification by automatically learning hierarchical features from raw pixel data. By implementing a CNN with convolutional and pooling layers, we can achieve high accuracy in tasks like handwritten digit recognition. The flexibility and effectiveness of CNNs make them an essential tool in modern computer vision applications.

School of AI | Newsletter

Discussion about this post