Part 5 - Implementing K-Nearest Neighbors (KNN) in Python
Machine Learning Algorithms Series - Classification with scikit-learn
Subtitle:
This article explains how to implement the K-Nearest Neighbors (KNN) algorithm in Python for classification problems. It covers importing necessary libraries, preparing data, training the model, making predictions, and evaluating performance using accuracy scores and confusion matrices.
Introduction to K-Nearest Neighbors (KNN)
The K-Nearest Neighbors (KNN) algorithm is a simple, non-parametric classification and regression algorithm. It classifies new data points based on the majority class of the K nearest points in the feature space. It is particularly useful for smaller datasets where the relationships among data points can be easily visualized.
Step-by-Step Implementation
Importing Libraries:
Import the
KNeighborsClassifierclass fromsklearn.neighbors.Import
train_test_splitfromsklearn.model_selectionto split the dataset into training and testing sets.Import
accuracy_scoreandconfusion_matrixfromsklearn.metricsto evaluate the model.Import
numpyfor numerical operations.
Preparing Data:
Create a NumPy array
Xrepresenting the hours studied and prior grades of students.Create a NumPy array
yrepresenting the outcomes (0 for fail, 1 for pass).
Splitting the Data:
Use
train_test_splitto divide the data into training and testing sets.Specify
test_size=0.2to use 20% of the data for testing and 80% for training.Set
random_state=42for reproducibility.
Initializing and Training the Model:
Create an instance of the
KNeighborsClassifierclass, specifying the number of neighbors (n_neighbors). For example,n_neighbors=3means the model will classify a new data point based on the majority class among its three nearest neighbors.Train the model using the training data
(X_train, y_train). KNN is a non-parametric model, meaning it stores the training data to make predictions based on the nearest neighbors.
Making Predictions:
Use the trained KNN model to make predictions on the test data
X_test.The output
y_predcontains the model's predictions (0 or 1).
Evaluating the Model:
Calculate the accuracy of the model by comparing the actual values
y_testto the predicted valuesy_predusingaccuracy_score.Compute the confusion matrix to understand true positives, true negatives, false positives, and false negatives.
Print the accuracy and confusion matrix.
Complete Code Example
# Import libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np
# Prepare data
X = np.array([[,], [,], [,], [,], [,], [,], [,], [,], [,], [,]])
y = np.array()
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
confusion_matrix = confusion_matrix(y_test, y_pred)
# Print results
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", confusion_matrix)
Conclusion
This article demonstrates a full workflow for training and evaluating a K-Nearest Neighbor classifier. The KNN model predicts binary outcomes based on hours studied and prior grades, showing how it classifies each test data point by looking at the classes of the nearest neighbors.

