Part 1 - Implementing Linear Regression in Python

Machine Learning Algorithms - A Step-by-Step Guide to Predicting House Prices Using scikit-learn

Feb 06, 2025

This article walks through implementing a linear regression model in Python to predict house prices based on house size. It covers importing necessary libraries, preparing data, training the model, making predictions, and evaluating performance.

Introduction to Linear Regression

Linear regression is a supervised learning algorithm used to predict a continuous target variable based on one or more input features. It finds the line of best fit by minimizing the sum of squared differences between the actual and predicted values.

Step-by-Step Implementation

Importing Libraries:
- Import the linear_model class from sklearn to implement linear regression.
- Import train_test_split from sklearn.model_selection to split the dataset into training and testing sets.
- Import mean_squared_error from sklearn.metrics to evaluate the model.
- Import numpy for numerical computations.
Preparing Data:
- Create a NumPy array X representing house sizes in square meters.
- Create a NumPy array y representing house prices.
Splitting the Data:
- Use train_test_split to divide the data into training and testing sets.
- Specify test_size=0.2 to use 20% of the data for testing and 80% for training.
- Set random_state=42 for reproducibility.
Initializing and Training the Model:
- Create an instance of the linear regression class.
- Train the model using the training data (X_train, y_train).
Making Predictions:
- Use the trained model to predict house prices for the testing set X_test.
Evaluating the Model:
- Calculate the mean squared error (MSE) between the actual (y_test) and predicted (y_pred) values.
- Print the mean squared error and predicted values.

Complete Code Example

The following code demonstrates the complete workflow for training and evaluating a simple linear regression model:

# Import libraries
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Prepare data
X = np.array([, , , , , , , , , ])
y = np.array()

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)

# Print results
print("Mean Squared Error:", mse)
print("Predicted values:", y_pred)

Get Started with the Structured Learning

Conclusion

This article provides a step-by-step guide to implementing linear regression in Python using scikit-learn. From data preparation and model training to prediction and performance evaluation, this example offers a practical understanding of how to use linear regression for predicting continuous target variables.

School of AI | Newsletter

Discussion about this post

Ready for more?