Part 2 - Implementing Ridge and Lasso Regression in Python
Machine Learning Algorithms - Regularization Techniques for Predicting House Prices with scikit-learn
This article explains how to implement Ridge and Lasso regression in Python to predict house prices, using regularization techniques to prevent overfitting. It covers importing necessary libraries, preparing data, training the models, making predictions, and evaluating performance.
Introduction to Ridge and Lasso Regression
Ridge and Lasso regression are regularization techniques applied to linear regression to prevent overfitting by penalizing large coefficients. Ridge regression adds an L2 penalty (sum of squared coefficients), while Lasso regression adds an L1 penalty (sum of absolute values of coefficients), which can lead to feature selection by shrinking some coefficients to zero.
Step-by-Step Implementation
Importing Libraries:
Import the Ridge and Lasso classes from
sklearn.linear_model
.Import
train_test_split
fromsklearn.model_selection
to split the dataset into training and testing sets.Import
mean_squared_error
fromsklearn.metrics
to evaluate the models.Import
numpy
for numerical computations.
Preparing Data:
Create a NumPy array
X
representing house sizes in square feet.Create a NumPy array
y
representing house prices.
Splitting the Data:
Use
train_test_split
to divide the data into training and testing sets.Specify
test_size=0.2
to use 20% of the data for testing and 80% for training.Set
random_state=42
for reproducibility.
Ridge Regression:
Initializing the Ridge Model:
Create an instance of the
Ridge
class with a specifiedalpha
value (e.g.,alpha=1.0
). Thealpha
parameter controls the strength of the regularization.
Training the Model:
Train the Ridge model using the training data
(X_train, y_train)
.
Making Predictions:
Use the trained model to predict house prices for the testing set
X_test
.
Evaluating the Model:
Calculate the mean squared error (MSE) between the actual
(y_test)
and predicted values(y_pred)
.Print the Ridge mean squared error.
Lasso Regression:
Initializing the Lasso Model:
Create an instance of the
Lasso
class with a specifiedalpha
value (e.g.,alpha=0.1
). Thealpha
parameter controls the L1 regularization strength.
Training the Model:
Train the Lasso model using the training data
(X_train, y_train)
.
Making Predictions:
Use the trained model to predict house prices for the testing set
X_test
.
Evaluating the Model:
Calculate the mean squared error (MSE) between the actual
(y_test)
and predicted values(y_pred)
.Print the Lasso mean squared error.
Complete Code Example
# Import libraries
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Prepare data
X = np.array([, , , , , , , , , ])
y = np.array()
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Ridge Regression
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)
ridge_pred = ridge_model.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_pred)
print("Ridge Mean Squared Error:", ridge_mse)
# Lasso Regression
lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)
lasso_pred = lasso_model.predict(X_test)
lasso_mse = mean_squared_error(y_test, lasso_pred)
print("Lasso Mean Squared Error:", lasso_mse)
Conclusion
This article provides a practical example of implementing Ridge and Lasso regression for predicting house prices and compares their mean squared errors to understand the impact of regularization on model performance. It helps in understanding how Ridge (L2 regularization) and Lasso (L1 regularization) can be used to prevent overfitting in linear regression models.