Part 3 - Implementing Polynomial Regression in Python

Machine Learning Algorithms - Fitting Nonlinear Relationships with scikit-learn

Feb 06, 2025

This article explains how to implement polynomial regression in Python to model nonlinear relationships between input features and a target variable. It covers importing necessary libraries, preparing data, transforming features into polynomial terms, training the model, making predictions, and evaluating performance.

Introduction to Polynomial Regression

Polynomial regression is an extension of linear regression that models the relationship between the input features and the target variable as an nth-degree polynomial. It can capture nonlinear relationships in the data by adding polynomial terms to the features.

Step-by-Step Implementation

Importing Libraries:
- Import the LinearRegression class from sklearn.linear_model.
- Import the PolynomialFeatures class from sklearn.preprocessing.
- Import train_test_split from sklearn.model_selection to split the dataset into training and testing sets.
- Import mean_squared_error from sklearn.metrics to evaluate the model.
- Import numpy for numerical operations.
Preparing Data:
- Create a NumPy array X representing the years of experience.
- Create a NumPy array y representing the corresponding salaries.
Splitting the Data:
- Use train_test_split to divide the data into training and testing sets.
- Specify test_size=0.2 to use 20% of the data for testing and 80% for training.
- Set random_state=42 for reproducibility.
Transforming Features into Polynomial Features:
- Initializing PolynomialFeatures:
  - Create an instance of the PolynomialFeatures class with a specified degree (e.g., degree=2). This transforms the feature X into a second-degree polynomial, adding squared terms to represent the nonlinear relationship.
- Transforming the Data:
  - Transform the X_train data into polynomial features using fit_transform.
  - Transform the X_test data into polynomial features using transform.
Initializing and Training the Model:
- Create an instance of the LinearRegression model.
- Train the model using the polynomial-transformed training data (X_train_poly, y_train).
Making Predictions:
- Use the trained model to predict salaries for the polynomial-transformed testing set X_test_poly.
Evaluating the Model:
- Calculate the mean squared error (MSE) between the actual (y_test) and predicted (y_pred) values.
- Print the mean squared error and predicted values.

Complete Code Example

# Import libraries
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Prepare data
X = np.array([, , , , , , , , , ])
y = np.array()

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Transform features into polynomial features
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Initialize and train the model
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Make predictions
y_pred = model.predict(X_test_poly)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)

# Print results
print("Mean Squared Error:", mse)
print("Predicted values:", y_pred)

Get started with the Structured Learning

Conclusion

This article demonstrates how to use polynomial regression to fit a nonlinear relationship between years of experience and salary. By transforming the features into polynomial terms, a simple linear regression model can fit a curve to the data. This approach allows for capturing more complex relationships compared to simple linear regression.

School of AI | Newsletter

Discussion about this post