Part 3 - Implementing Polynomial Regression in Python
Machine Learning Algorithms - Fitting Nonlinear Relationships with scikit-learn
This article explains how to implement polynomial regression in Python to model nonlinear relationships between input features and a target variable. It covers importing necessary libraries, preparing data, transforming features into polynomial terms, training the model, making predictions, and evaluating performance.
Introduction to Polynomial Regression
Polynomial regression is an extension of linear regression that models the relationship between the input features and the target variable as an nth-degree polynomial. It can capture nonlinear relationships in the data by adding polynomial terms to the features.
Step-by-Step Implementation
Importing Libraries:
Import the
LinearRegressionclass fromsklearn.linear_model.Import the
PolynomialFeaturesclass fromsklearn.preprocessing.Import
train_test_splitfromsklearn.model_selectionto split the dataset into training and testing sets.Import
mean_squared_errorfromsklearn.metricsto evaluate the model.Import
numpyfor numerical operations.
Preparing Data:
Create a NumPy array
Xrepresenting the years of experience.Create a NumPy array
yrepresenting the corresponding salaries.
Splitting the Data:
Use
train_test_splitto divide the data into training and testing sets.Specify
test_size=0.2to use 20% of the data for testing and 80% for training.Set
random_state=42for reproducibility.
Transforming Features into Polynomial Features:
Initializing PolynomialFeatures:
Create an instance of the
PolynomialFeaturesclass with a specified degree (e.g.,degree=2). This transforms the featureXinto a second-degree polynomial, adding squared terms to represent the nonlinear relationship.
Transforming the Data:
Transform the
X_traindata into polynomial features usingfit_transform.Transform the
X_testdata into polynomial features usingtransform.
Initializing and Training the Model:
Create an instance of the
LinearRegressionmodel.Train the model using the polynomial-transformed training data
(X_train_poly, y_train).
Making Predictions:
Use the trained model to predict salaries for the polynomial-transformed testing set
X_test_poly.
Evaluating the Model:
Calculate the mean squared error (MSE) between the actual
(y_test)and predicted(y_pred)values.Print the mean squared error and predicted values.
Complete Code Example
# Import libraries
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Prepare data
X = np.array([, , , , , , , , , ])
y = np.array()
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Transform features into polynomial features
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Initialize and train the model
model = LinearRegression()
model.fit(X_train_poly, y_train)
# Make predictions
y_pred = model.predict(X_test_poly)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
# Print results
print("Mean Squared Error:", mse)
print("Predicted values:", y_pred)
Conclusion
This article demonstrates how to use polynomial regression to fit a nonlinear relationship between years of experience and salary. By transforming the features into polynomial terms, a simple linear regression model can fit a curve to the data. This approach allows for capturing more complex relationships compared to simple linear regression.

