Simple linear regression models a linear relationship between variables. A cost function measures prediction error, and gradient descent finds the best-fitting line by minimizing this error. The implementation of this process uses Python with the NumPy library for numerical operations and Matplotlib for visualization.We'll use a straightforward, synthetic dataset: the relationship between hours studied and exam scores. This keeps the focus on the regression mechanism itself.Setting Up Our ToolsFirst, we need to import the libraries we'll use. If you don't have them installed, you might need to install them first (e.g., using pip install numpy matplotlib).import numpy as np import matplotlib.pyplot as plt # Style settings for the plots for better visibility plt.style.use('seaborn-v0_8-whitegrid')Our Sample DataLet's imagine we have data from a few students showing how many hours they studied and the score they received on an exam.# Hours Studied (Feature X) X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) # Exam Score (Target y) y = np.array([45, 50, 55, 60, 65, 70, 75, 80, 85, 90]) # Let's print the data to see it print("Hours Studied (X):", X) print("Exam Score (y):", y)This data suggests a positive linear relationship: more study hours generally lead to higher scores. Let's visualize it to confirm.# Create the plot fig, ax = plt.subplots() ax.scatter(X, y, color='#1c7ed6', label='Student Data') # Use a blue color ax.set_xlabel("Hours Studied") ax.set_ylabel("Exam Score") ax.set_title("Exam Score vs. Hours Studied") ax.legend() ax.grid(True) plt.show(){"layout": {"title": "Exam Score vs. Hours Studied", "xaxis": {"title": "Hours Studied"}, "yaxis": {"title": "Exam Score"}, "showlegend": true}, "data": [{"type": "scatter", "mode": "markers", "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [45, 50, 55, 60, 65, 70, 75, 80, 85, 90], "marker": {"color": "#1c7ed6"}, "name": "Student Data"}]}A scatter plot showing the relationship between hours studied (x-axis) and exam score (y-axis) for our sample data.The plot clearly shows the points clustering around a line, making simple linear regression a suitable model.The Linear Regression ModelRemember our model for simple linear regression: $$ \hat{y} = mx + b $$ Here, $\hat{y}$ is the predicted exam score, $x$ is the hours studied, $m$ is the slope of the line, and $b$ is the y-intercept. Our goal is to find the values of $m$ and $b$ that make the line fit the data best.Implementing the Cost Function (MSE)We need a way to measure how well a given line (defined by $m$ and $b$) fits the data. We use the Mean Squared Error (MSE) cost function: $$ J(m, b) = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}i)^2 = \frac{1}{N} \sum{i=1}^{N} (y_i - (mx_i + b))^2 $$ where $N$ is the number of data points.Let's write a Python function to calculate this:def calculate_cost(X, y, m, b): """ Calculates the Mean Squared Error cost. Args: X: Numpy array of input features. y: Numpy array of target values. m: Current slope of the regression line. b: Current y-intercept of the regression line. Returns: The calculated MSE cost. """ N = len(X) predictions = m * X + b error = y - predictions cost = np.sum(error**2) / N return cost # Example: Calculate cost for an initial guess (m=0, b=0) initial_m = 0 initial_b = 0 initial_cost = calculate_cost(X, y, initial_m, initial_b) print(f"Initial Cost (m=0, b=0): {initial_cost}")Implementing Gradient DescentNow, we'll implement the gradient descent algorithm to find the optimal $m$ and $b$ by iteratively minimizing the cost function. We need the partial derivatives of the cost function $J(m, b)$ with respect to $m$ and $b$:Derivative with respect to $m$: $$ \frac{\partial J}{\partial m} = \frac{-2}{N} \sum_{i=1}^{N} x_i(y_i - (mx_i + b)) $$Derivative with respect to $b$: $$ \frac{\partial J}{\partial b} = \frac{-2}{N} \sum_{i=1}^{N} (y_i - (mx_i + b)) $$The update rules for $m$ and $b$ in each iteration are: $$ m := m - \alpha \frac{\partial J}{\partial m} $$ $$ b := b - \alpha \frac{\partial J}{\partial b} $$ where $\alpha$ is the learning rate.Let's write the gradient descent function:def gradient_descent(X, y, initial_m, initial_b, learning_rate, iterations): """ Performs gradient descent to find optimal m and b. Args: X: Numpy array of input features. y: Numpy array of target values. initial_m: Starting slope. initial_b: Starting y-intercept. learning_rate: Step size for updates. iterations: Number of iterations to run. Returns: A tuple (m, b, cost_history) containing the final slope, final intercept, and a list of costs at each iteration. """ m = initial_m b = initial_b N = len(X) cost_history = [] # To store cost per iteration for i in range(iterations): # 1. Calculate predictions predictions = m * X + b # 2. Calculate the error error = y - predictions # 3. Calculate the gradients gradient_m = (-2/N) * np.sum(X * error) gradient_b = (-2/N) * np.sum(error) # 4. Update m and b m = m - learning_rate * gradient_m b = b - learning_rate * gradient_b # 5. Calculate and store the cost for this iteration cost = calculate_cost(X, y, m, b) cost_history.append(cost) # Optional: Print cost every few iterations to monitor progress if (i + 1) % 100 == 0: print(f"Iteration {i+1}/{iterations}, Cost: {cost:.4f}") return m, b, cost_history # Set hyperparameters learning_rate = 0.01 iterations = 1000 # Run gradient descent final_m, final_b, cost_history = gradient_descent(X, y, initial_m, initial_b, learning_rate, iterations) print(f"\nTraining finished.") print(f"Final Slope (m): {final_m:.4f}") print(f"Final Intercept (b): {final_b:.4f}") print(f"Final Cost (MSE): {cost_history[-1]:.4f}")You should see the cost decreasing with each printed iteration, indicating that gradient descent is successfully finding better values for $m$ and $b$.We can also plot the cost history to visualize the optimization process:# Plotting the cost history plt.figure() plt.plot(range(iterations), cost_history, color='#f03e3e') # Use a red color plt.xlabel("Iterations") plt.ylabel("Cost (MSE)") plt.title("Cost Function Value over Iterations") plt.grid(True) plt.show(){"layout": {"title": "Cost Function Value over Iterations", "xaxis": {"title": "Iterations"}, "yaxis": {"title": "Cost (MSE)"}}, "data": [{"type": "scatter", "mode": "lines", "y": "cost_history_placeholder", "line": {"color": "#f03e3e"}, "name": "Cost"}]}Cost (MSE) decreasing over gradient descent iterations, indicating the model is learning. Note: The actual values depend on the cost_history list generated by your code.Visualizing the ResultNow, let's plot our original data along with the regression line we just found using the final_m and final_b values.# Generate points for the regression line line_x = np.array([min(X) - 1, max(X) + 1]) # Extend line slightly past data range line_y = final_m * line_x + final_b # Create the plot fig, ax = plt.subplots() ax.scatter(X, y, color='#1c7ed6', label='Student Data') # Blue for data points ax.plot(line_x, line_y, color='#f03e3e', label='Regression Line') # Red for the line ax.set_xlabel("Hours Studied") ax.set_ylabel("Exam Score") ax.set_title("Simple Linear Regression Fit") ax.legend() ax.grid(True) plt.show(){"layout": {"title": "Simple Linear Regression Fit", "xaxis": {"title": "Hours Studied"}, "yaxis": {"title": "Exam Score"}, "showlegend": true}, "data": [{"type": "scatter", "mode": "markers", "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [45, 50, 55, 60, 65, 70, 75, 80, 85, 90], "marker": {"color": "#1c7ed6"}, "name": "Student Data"}, {"type": "scatter", "mode": "lines", "x": [0, 11], "y": "line_y_placeholder", "line": {"color": "#f03e3e"}, "name": "Regression Line"}]}Scatter plot of the student data points with the calculated best-fit regression line overlaid. Note: The exact line depends on the final_m and final_b calculated by your code. line_y_placeholder should be replaced with actual calculated y-values for x=0 and x=11 using final_m and final_b.The line should pass nicely through the center of our data points.Making PredictionsWith our trained model (i.e., having found final_m and final_b), we can now predict the exam score for a new number of study hours. For instance, what score would we predict for someone who studied for 7.5 hours?# Predict score for 7.5 hours of study hours_new = 7.5 predicted_score = final_m * hours_new + final_b print(f"\nPredicted score for {hours_new} hours of study: {predicted_score:.2f}")SummaryIn this practice section, you implemented simple linear regression from the ground up:Visualized sample data.Implemented the MSE cost function.Implemented the gradient descent algorithm to minimize the cost.Used the learned parameters ($m$ and $b$) to draw the regression line.Made a prediction on new data.While libraries like Scikit-learn (which we'll encounter later) can perform these steps with just a few lines of code, understanding the mechanics of cost functions and gradient descent provides a solid foundation for tackling more complex machine learning models. You've now seen how a model can "learn" from data by iteratively adjusting its parameters to reduce prediction errors.