Part 3 - Implementing Polynomial Regression in Python
Machine Learning Algorithms - Fitting Nonlinear Relationships with scikit-learn
This article explains how to implement polynomial regression in Python to model nonlinear relationships between input features and a target variable. It covers importing necessary libraries, preparing data, transforming features into polynomial terms, training the model, making predictions, and evaluating performance.
Introduction to Polynomial Regression
Polynomial regression is an extension of linear regression that models the relationship between the input features and the target variable as an nth-degree polynomial. It can capture nonlinear relationships in the data by adding polynomial terms to the features.
Step-by-Step Implementation
Importing Libraries:
Import the
LinearRegression
class fromsklearn.linear_model
.Import the
PolynomialFeatures
class fromsklearn.preprocessing
.Import
train_test_split
fromsklearn.model_selection
to split the dataset into training and testing sets.Import
mean_squared_error
fromsklearn.metrics
to evaluate the model.Import
numpy
for numerical operations.
Preparing Data:
Create a NumPy array
X
representing the years of experience.Create a NumPy array
y
representing the corresponding salaries.
Splitting the Data:
Use
train_test_split
to divide the data into training and testing sets.Specify
test_size=0.2
to use 20% of the data for testing and 80% for training.Set
random_state=42
for reproducibility.
Transforming Features into Polynomial Features:
Initializing PolynomialFeatures:
Create an instance of the
PolynomialFeatures
class with a specified degree (e.g.,degree=2
). This transforms the featureX
into a second-degree polynomial, adding squared terms to represent the nonlinear relationship.
Transforming the Data:
Transform the
X_train
data into polynomial features usingfit_transform
.Transform the
X_test
data into polynomial features usingtransform
.
Initializing and Training the Model:
Create an instance of the
LinearRegression
model.Train the model using the polynomial-transformed training data
(X_train_poly, y_train)
.
Making Predictions:
Use the trained model to predict salaries for the polynomial-transformed testing set
X_test_poly
.
Evaluating the Model:
Calculate the mean squared error (MSE) between the actual
(y_test)
and predicted(y_pred)
values.Print the mean squared error and predicted values.
Complete Code Example
# Import libraries
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Prepare data
X = np.array([, , , , , , , , , ])
y = np.array()
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Transform features into polynomial features
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Initialize and train the model
model = LinearRegression()
model.fit(X_train_poly, y_train)
# Make predictions
y_pred = model.predict(X_test_poly)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
# Print results
print("Mean Squared Error:", mse)
print("Predicted values:", y_pred)
Conclusion
This article demonstrates how to use polynomial regression to fit a nonlinear relationship between years of experience and salary. By transforming the features into polynomial terms, a simple linear regression model can fit a curve to the data. This approach allows for capturing more complex relationships compared to simple linear regression.