Alright, let's put the concepts of simple linear regression into practice. We've discussed how it models a linear relationship, how a cost function measures error, and how gradient descent helps find the best-fitting line. Now, we'll walk through implementing this process using Python with the NumPy library for numerical operations and Matplotlib for visualization.
We'll use a straightforward, synthetic dataset: the relationship between hours studied and exam scores. This keeps the focus on the regression mechanism itself.
First, we need to import the libraries we'll use. If you don't have them installed, you might need to install them first (e.g., using pip install numpy matplotlib
).
import numpy as np
import matplotlib.pyplot as plt
# Style settings for the plots for better visibility
plt.style.use('seaborn-v0_8-whitegrid')
Let's imagine we have data from a few students showing how many hours they studied and the score they received on an exam.
# Hours Studied (Feature X)
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Exam Score (Target y)
y = np.array([45, 50, 55, 60, 65, 70, 75, 80, 85, 90])
# Let's print the data to see it
print("Hours Studied (X):", X)
print("Exam Score (y):", y)
This data suggests a positive linear relationship: more study hours generally lead to higher scores. Let's visualize it to confirm.
# Create the plot
fig, ax = plt.subplots()
ax.scatter(X, y, color='#1c7ed6', label='Student Data') # Use a blue color
ax.set_xlabel("Hours Studied")
ax.set_ylabel("Exam Score")
ax.set_title("Exam Score vs. Hours Studied")
ax.legend()
ax.grid(True)
plt.show()
A scatter plot showing the relationship between hours studied (x-axis) and exam score (y-axis) for our sample data.
The plot clearly shows the points clustering around a line, making simple linear regression a suitable model.
Remember our model for simple linear regression: y^=mx+b Here, y^ is the predicted exam score, x is the hours studied, m is the slope of the line, and b is the y-intercept. Our goal is to find the values of m and b that make the line fit the data best.
We need a way to measure how well a given line (defined by m and b) fits the data. We use the Mean Squared Error (MSE) cost function: J(m,b)=N1∑i=1N(yi−y^i)2=N1∑i=1N(yi−(mxi+b))2 where N is the number of data points.
Let's write a Python function to calculate this:
def calculate_cost(X, y, m, b):
"""
Calculates the Mean Squared Error cost.
Args:
X: Numpy array of input features.
y: Numpy array of target values.
m: Current slope of the regression line.
b: Current y-intercept of the regression line.
Returns:
The calculated MSE cost.
"""
N = len(X)
predictions = m * X + b
error = y - predictions
cost = np.sum(error**2) / N
return cost
# Example: Calculate cost for an initial guess (m=0, b=0)
initial_m = 0
initial_b = 0
initial_cost = calculate_cost(X, y, initial_m, initial_b)
print(f"Initial Cost (m=0, b=0): {initial_cost}")
Now, we'll implement the gradient descent algorithm to find the optimal m and b by iteratively minimizing the cost function. We need the partial derivatives of the cost function J(m,b) with respect to m and b:
Derivative with respect to m: ∂m∂J=N−2∑i=1Nxi(yi−(mxi+b))
Derivative with respect to b: ∂b∂J=N−2∑i=1N(yi−(mxi+b))
The update rules for m and b in each iteration are: m:=m−α∂m∂J b:=b−α∂b∂J where α is the learning rate.
Let's write the gradient descent function:
def gradient_descent(X, y, initial_m, initial_b, learning_rate, iterations):
"""
Performs gradient descent to find optimal m and b.
Args:
X: Numpy array of input features.
y: Numpy array of target values.
initial_m: Starting slope.
initial_b: Starting y-intercept.
learning_rate: Step size for updates.
iterations: Number of iterations to run.
Returns:
A tuple (m, b, cost_history) containing the final slope,
final intercept, and a list of costs at each iteration.
"""
m = initial_m
b = initial_b
N = len(X)
cost_history = [] # To store cost per iteration
for i in range(iterations):
# 1. Calculate predictions
predictions = m * X + b
# 2. Calculate the error
error = y - predictions
# 3. Calculate the gradients
gradient_m = (-2/N) * np.sum(X * error)
gradient_b = (-2/N) * np.sum(error)
# 4. Update m and b
m = m - learning_rate * gradient_m
b = b - learning_rate * gradient_b
# 5. Calculate and store the cost for this iteration
cost = calculate_cost(X, y, m, b)
cost_history.append(cost)
# Optional: Print cost every few iterations to monitor progress
if (i + 1) % 100 == 0:
print(f"Iteration {i+1}/{iterations}, Cost: {cost:.4f}")
return m, b, cost_history
# Set hyperparameters
learning_rate = 0.01
iterations = 1000
# Run gradient descent
final_m, final_b, cost_history = gradient_descent(X, y, initial_m, initial_b, learning_rate, iterations)
print(f"\nTraining finished.")
print(f"Final Slope (m): {final_m:.4f}")
print(f"Final Intercept (b): {final_b:.4f}")
print(f"Final Cost (MSE): {cost_history[-1]:.4f}")
You should see the cost decreasing with each printed iteration, indicating that gradient descent is successfully finding better values for m and b.
We can also plot the cost history to visualize the optimization process:
# Plotting the cost history
plt.figure()
plt.plot(range(iterations), cost_history, color='#f03e3e') # Use a red color
plt.xlabel("Iterations")
plt.ylabel("Cost (MSE)")
plt.title("Cost Function Value over Iterations")
plt.grid(True)
plt.show()
Cost (MSE) decreasing over gradient descent iterations, indicating the model is learning. Note: The actual values depend on the
cost_history
list generated by your code.
Now, let's plot our original data along with the regression line we just found using the final_m
and final_b
values.
# Generate points for the regression line
line_x = np.array([min(X) - 1, max(X) + 1]) # Extend line slightly beyond data range
line_y = final_m * line_x + final_b
# Create the plot
fig, ax = plt.subplots()
ax.scatter(X, y, color='#1c7ed6', label='Student Data') # Blue for data points
ax.plot(line_x, line_y, color='#f03e3e', label='Regression Line') # Red for the line
ax.set_xlabel("Hours Studied")
ax.set_ylabel("Exam Score")
ax.set_title("Simple Linear Regression Fit")
ax.legend()
ax.grid(True)
plt.show()
Scatter plot of the student data points with the calculated best-fit regression line overlaid. Note: The exact line depends on the
final_m
andfinal_b
calculated by your code.line_y_placeholder
should be replaced with actual calculated y-values for x=0 and x=11 usingfinal_m
andfinal_b
.
The line should pass nicely through the center of our data points.
With our trained model (i.e., having found final_m
and final_b
), we can now predict the exam score for a new number of study hours. For instance, what score would we predict for someone who studied for 7.5 hours?
# Predict score for 7.5 hours of study
hours_new = 7.5
predicted_score = final_m * hours_new + final_b
print(f"\nPredicted score for {hours_new} hours of study: {predicted_score:.2f}")
In this practice section, you implemented simple linear regression from the ground up:
While libraries like Scikit-learn (which we'll encounter later) can perform these steps with just a few lines of code, understanding the mechanics of cost functions and gradient descent provides a solid foundation for tackling more complex machine learning models. You've now seen how a model can "learn" from data by iteratively adjusting its parameters to reduce prediction errors.
© 2025 ApX Machine Learning