Part 4 - Implementing Logistic Regression in Python

Machine Learning Algorithms - Binary Classification with scikit-learn

Feb 06, 2025

This article explains how to implement logistic regression in Python for binary classification problems. It covers importing necessary libraries, preparing data, training the model, making predictions, and evaluating performance using accuracy scores and confusion matrices.

Introduction to Logistic Regression

Logistic regression is a supervised learning algorithm used for binary classification problems, predicting the probability of an observation belonging to a particular class. It applies the logistic (sigmoid) function, outputting values between 0 and 1.

Step-by-Step Implementation

Importing Libraries:
- Import the LogisticRegression class from sklearn.linear_model.
- Import train_test_split from sklearn.model_selection to split the dataset into training and testing sets.
- Import accuracy_score and confusion_matrix from sklearn.metrics to evaluate the model.
- Import numpy for numerical computations.
Preparing Data:
- Create a NumPy array X representing the hours studied by a student.
- Create a NumPy array y representing the outcomes (0 for fail, 1 for pass).
Splitting the Data:
- Use train_test_split to divide the data into training and testing sets.
- Specify test_size=0.2 to use 20% of the data for testing and 80% for training.
- Set random_state=42 for reproducibility.
Initializing and Training the Model:
- Create an instance of the LogisticRegression class.
- Train the model using the training data (X_train, y_train).
Making Predictions:
- Use the trained model to make predictions on the test data X_test.
- The output y_pred contains the model's predictions (0 or 1).
Evaluating the Model:
- Calculate the accuracy of the model by comparing the actual values y_test to the predicted values y_pred using accuracy_score.
- Compute the confusion matrix to understand true positives, true negatives, false positives, and false negatives.
- Print the accuracy and confusion matrix.

Complete Code Example

# Import libraries
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np

# Prepare data
X = np.array([, , , , , , , , , ])
y = np.array()

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
confusion_matrix = confusion_matrix(y_test, y_pred)

# Print results
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", confusion_matrix)

Get started with the Structured Learning

Conclusion

This article provides a complete example of a logistic regression workflow, including data preparation, model training, prediction, and performance evaluation. It is particularly useful for binary classification tasks, where the goal is to predict one of two outcomes.

School of AI | Newsletter

Discussion about this post

Ready for more?