Part 2 - Implementing Ridge and Lasso Regression in Python

Machine Learning Algorithms - Regularization Techniques for Predicting House Prices with scikit-learn

Feb 06, 2025

This article explains how to implement Ridge and Lasso regression in Python to predict house prices, using regularization techniques to prevent overfitting. It covers importing necessary libraries, preparing data, training the models, making predictions, and evaluating performance.

Introduction to Ridge and Lasso Regression

Ridge and Lasso regression are regularization techniques applied to linear regression to prevent overfitting by penalizing large coefficients. Ridge regression adds an L2 penalty (sum of squared coefficients), while Lasso regression adds an L1 penalty (sum of absolute values of coefficients), which can lead to feature selection by shrinking some coefficients to zero.

Step-by-Step Implementation

Importing Libraries:
- Import the Ridge and Lasso classes from sklearn.linear_model.
- Import train_test_split from sklearn.model_selection to split the dataset into training and testing sets.
- Import mean_squared_error from sklearn.metrics to evaluate the models.
- Import numpy for numerical computations.
Preparing Data:
- Create a NumPy array X representing house sizes in square feet.
- Create a NumPy array y representing house prices.
Splitting the Data:
- Use train_test_split to divide the data into training and testing sets.
- Specify test_size=0.2 to use 20% of the data for testing and 80% for training.
- Set random_state=42 for reproducibility.
Ridge Regression:
- Initializing the Ridge Model:
  - Create an instance of the Ridge class with a specified alpha value (e.g., alpha=1.0). The alpha parameter controls the strength of the regularization.
- Training the Model:
  - Train the Ridge model using the training data (X_train, y_train).
- Making Predictions:
  - Use the trained model to predict house prices for the testing set X_test.
- Evaluating the Model:
  - Calculate the mean squared error (MSE) between the actual (y_test) and predicted values (y_pred).
  - Print the Ridge mean squared error.
Lasso Regression:
- Initializing the Lasso Model:
  - Create an instance of the Lasso class with a specified alpha value (e.g., alpha=0.1). The alpha parameter controls the L1 regularization strength.
- Training the Model:
  - Train the Lasso model using the training data (X_train, y_train).
- Making Predictions:
  - Use the trained model to predict house prices for the testing set X_test.
- Evaluating the Model:
  - Calculate the mean squared error (MSE) between the actual (y_test) and predicted values (y_pred).
  - Print the Lasso mean squared error.

Complete Code Example

# Import libraries
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Prepare data
X = np.array([, , , , , , , , , ])
y = np.array()

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Ridge Regression
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)
ridge_pred = ridge_model.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_pred)
print("Ridge Mean Squared Error:", ridge_mse)

# Lasso Regression
lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)
lasso_pred = lasso_model.predict(X_test)
lasso_mse = mean_squared_error(y_test, lasso_pred)
print("Lasso Mean Squared Error:", lasso_mse)

Get started with the Structured Learning

Conclusion

This article provides a practical example of implementing Ridge and Lasso regression for predicting house prices and compares their mean squared errors to understand the impact of regularization on model performance. It helps in understanding how Ridge (L2 regularization) and Lasso (L1 regularization) can be used to prevent overfitting in linear regression models.

School of AI | Newsletter

Discussion about this post