Part 26 - Long Short-Term Memory (LSTM) Networks for Sentiment Analysis

Machine Learning Algorithms Series - Implementing an LSTM Network for Enhanced Movie Review Sentiment Analysis in Python

Feb 06, 2025

Following our exploration of RNNs, this article introduces Long Short-Term Memory (LSTM) networks, a specialized type of RNN designed to capture long-term dependencies in sequential data. LSTMs use gating mechanisms to control the flow of information, which helps prevent the vanishing gradient problem that standard RNNs often face. This article guides you through implementing an LSTM network for sentiment analysis, using the IMDb dataset to classify movie reviews.

Understanding Long Short-Term Memory Networks

Long-Term Dependencies: LSTMs are designed to capture relationships in sequential data that span long distances.
Gating Mechanisms: LSTMs use gates to regulate the flow of information, allowing them to selectively remember or forget information over time.
Vanishing Gradient Problem: LSTMs mitigate the vanishing gradient problem, which can hinder the training of standard RNNs.
Applications: LSTMs are commonly used in tasks like language modeling, machine translation, and time series prediction.

Step-by-Step Implementation

Import Libraries:
- tensorflow: The main library for deep learning, providing tools to define and train neural networks.
- tensorflow.keras.layers: Modules used to define the layers and structure of the neural network.
- tensorflow.keras.datasets: Includes the IMDb dataset of movie reviews.
- tensorflow.keras.preprocessing.sequence: A utility for pre-processing sequences, specifically to pad or truncate sequences to the same length.
Load and Pre-process the IMDb Dataset:
- Set max_features = 10000 to limit the vocabulary size to the top 10,000 most frequent words in the dataset.
- Set max_length = 500 to limit each movie review to 500 words; longer reviews are truncated, and shorter ones are padded.
- Load the IMDb dataset using imdb.load_data(num_words=max_features) to split the data into training and testing sets.
- Pad or truncate each review in the training and testing sets to ensure all reviews have exactly max_length words using sequence.pad_sequences().
Define the LSTM Model:
- Create a sequential model using models.sequential(), where each layer's output is passed to the next layer.
- Add an embedding layer (layers.Embedding) to convert word indices into dense vectors of a fixed size.
  - layers.Embedding(max_features, 32, input_length=max_length): This layer converts word indices into dense vectors of fixed size; max_features is the vocabulary size, 32 sets the size of each word vector (embedding dimension), and input_length defines the length of the input sequences to 500 words.
- Add an LSTM layer (layers.LSTM) with 32 units.
  - layers.LSTM(32): LSTM layers are designed to capture sequential dependencies in the data, making them well-suited for text processing.
- Add a dense output layer (layers.Dense) with one neuron and a sigmoid activation function to predict the probability of a positive or negative sentiment.
  - layers.Dense(1, activation='sigmoid'): The fully connected output layer uses one neuron and a sigmoid activation function to output a probability between 0 and 1 for binary classification.
Compile the Model:
- Configure the model for training using model.compile().
- Specify the optimizer (Adam), loss function (binary cross-entropy), and metrics (accuracy).
  - optimizer='adam': Uses the Adam optimizer, which adjusts learning rates dynamically.
  - loss='binary_crossentropy': The binary cross-entropy loss function is appropriate for binary classification tasks.
  - metrics=['accuracy']: Specifies accuracy as the evaluation metric.
Train the Model:
- Train the model using model.fit() with training data, epochs, batch size, and validation split.
- model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2): Trains the model on the training data, with epochs=5 to iterate over the entire training dataset five times, batch_size=64 to set the number of samples processed per training batch, and validation_split=0.2 to reserve 20% of the training data for validation.
Evaluate the Model:
- Evaluate the model on the test data using model.evaluate() to calculate the test loss and accuracy.
  - test_loss, test_accuracy = model.evaluate(x_test, y_test): Evaluates the model on the test set, returning the test loss and accuracy.
- Print the test accuracy to show the model's generalization performance on unseen data.

Complete Code Example:

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

# Load and pre-process the IMDb dataset
max_features = 10000
max_length = 500
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=max_length)
x_test = sequence.pad_sequences(x_test, maxlen=max_length)

# Define the LSTM model
model = models.Sequential([
    layers.Embedding(max_features, 32, input_length=max_length),
    layers.LSTM(32),
    layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f'Test accuracy: {test_accuracy}')

Get started with the Structured Learning

Conclusion

LSTMs enhance sentiment analysis by effectively capturing long-term dependencies in text. By implementing an LSTM network with embedding and recurrent layers, we achieve improved accuracy in movie review sentiment classification. The gating mechanisms in LSTMs make them ideal for complex sequential data analysis, outperforming traditional RNNs in tasks requiring memory of long-range context.

School of AI | Newsletter

Discussion about this post