Part 1 - Implementing Linear Regression in Python
Machine Learning Algorithms - A Step-by-Step Guide to Predicting House Prices Using scikit-learn
This article walks through implementing a linear regression model in Python to predict house prices based on house size. It covers importing necessary libraries, preparing data, training the model, making predictions, and evaluating performance.
Introduction to Linear Regression
Linear regression is a supervised learning algorithm used to predict a continuous target variable based on one or more input features. It finds the line of best fit by minimizing the sum of squared differences between the actual and predicted values.
Step-by-Step Implementation
Importing Libraries:
Import the
linear_model
class fromsklearn
to implement linear regression.Import
train_test_split
fromsklearn.model_selection
to split the dataset into training and testing sets.Import
mean_squared_error
fromsklearn.metrics
to evaluate the model.Import
numpy
for numerical computations.
Preparing Data:
Create a NumPy array
X
representing house sizes in square meters.Create a NumPy array
y
representing house prices.
Splitting the Data:
Use
train_test_split
to divide the data into training and testing sets.Specify
test_size=0.2
to use 20% of the data for testing and 80% for training.Set
random_state=42
for reproducibility.
Initializing and Training the Model:
Create an instance of the
linear regression
class.Train the model using the training data
(X_train, y_train)
.
Making Predictions:
Use the trained model to predict house prices for the testing set
X_test
.
Evaluating the Model:
Calculate the mean squared error (MSE) between the actual
(y_test)
and predicted(y_pred)
values.Print the mean squared error and predicted values.
Complete Code Example
The following code demonstrates the complete workflow for training and evaluating a simple linear regression model:
# Import libraries
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Prepare data
X = np.array([, , , , , , , , , ])
y = np.array()
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
# Print results
print("Mean Squared Error:", mse)
print("Predicted values:", y_pred)
Conclusion
This article provides a step-by-step guide to implementing linear regression in Python using scikit-learn. From data preparation and model training to prediction and performance evaluation, this example offers a practical understanding of how to use linear regression for predicting continuous target variables.