Implement the basic gradient descent algorithm from scratch using Python. This practical exercise demonstrates the iterative optimization process in action."We'll apply gradient descent to find the minimum of a simple single-variable function. While machine learning problems involve functions with thousands or millions of variables (parameters), starting with one variable makes it easier to grasp the core update mechanism and visualize the process."The Problem: Minimizing a Simple Quadratic FunctionLet's consider the function $f(x) = x^2 - 4x + 5$. This is a parabola opening upwards, so it has a single global minimum.Our goal is to find the value of $x$ that minimizes $f(x)$ using gradient descent.Step 1: Define the Function and its GradientFirst, we need the function itself and its derivative (gradient). The function is: $$f(x) = x^2 - 4x + 5$$The derivative (gradient in this 1D case) tells us the slope at any point $x$: $$\nabla f(x) = f'(x) = \frac{d}{dx}(x^2 - 4x + 5) = 2x - 4$$We can define these in Python:import numpy as np # Define the function def f(x): return x**2 - 4*x + 5 # Define the gradient (derivative) of the function def gradient(x): return 2*x - 4Step 2: Implement the Gradient Descent AlgorithmThe core of gradient descent is the update rule, applied iteratively: $$x_{\text{new}} = x_{\text{old}} - \alpha \nabla f(x_{\text{old}})$$ where $\alpha$ is the learning rate.Let's implement this. We need:An initial guess for $x$.A learning rate $\alpha$.A number of iterations to run the algorithm.def gradient_descent(start_x, learning_rate, n_iterations): """ Performs gradient descent to find the minimum of f(x). Args: start_x: The initial guess for x. learning_rate: The step size alpha. n_iterations: The number of iterations to perform. Returns: A tuple containing: - The final estimated value of x. - A list of x values at each iteration (history). """ x = start_x history = [x] # Store history for visualization print(f"Starting Gradient Descent at x = {x:.4f}") print("Iter | Current x | Gradient | Next x") print("--------------------------------------") for i in range(n_iterations): grad = gradient(x) next_x = x - learning_rate * grad print(f"{i+1:4} | {x:9.4f} | {grad:8.4f} | {next_x:9.4f}") x = next_x history.append(x) print("--------------------------------------") print(f"Gradient Descent finished after {n_iterations} iterations.") print(f"Estimated minimum occurs at x = {x:.4f}") print(f"Value of f(x) at minimum: {f(x):.4f}") return x, history # --- Set Parameters and Run --- initial_x = 0.0 # Starting point alpha = 0.1 # Learning rate iterations = 25 # Number of steps final_x, x_history = gradient_descent(initial_x, alpha, iterations)Running this code will print the progress of the algorithm at each step, showing how the value of $x$ approaches the minimum. You should observe that the gradient gets closer to zero as $x$ approaches the minimum (which we know analytically is at $x=2$, since $2x-4=0$ implies $x=2$).Step 3: Visualizing the DescentVisualizing the process can provide further insight. We can plot the function $f(x)$ and overlay the steps taken by the gradient descent algorithm.{ "data": [ { "x": [-1, 0, 1, 2, 3, 4, 5], "y": [10, 5, 2, 1, 2, 5, 10], "type": "scatter", "mode": "lines", "name": "f(x) = x^2 - 4x + 5", "line": {"color": "#4263eb", "width": 2} }, { "x": [0.0, 0.4, 0.72, 0.976, 1.1808, 1.3446, 1.4757, 1.5806, 1.6645, 1.7316, 1.7853, 1.8282, 1.8626, 1.8901, 1.9121, 1.9296, 1.9437, 1.955, 1.964, 1.9712, 1.977, 1.9816, 1.9852, 1.9882, 1.9906, 1.9924], "y": [5.0, 3.56, 2.6112, 2.0599, 1.7363, 1.5431, 1.4223, 1.3443, 1.2931, 1.2589, 1.2357, 1.2198, 1.2087, 1.199, 1.192, 1.1867, 1.1827, 1.1798, 1.1776, 1.1759, 1.1747, 1.1737, 1.173, 1.1724, 1.1719, 1.1715], "type": "scatter", "mode": "markers+lines", "name": "Gradient Descent Steps (α=0.1)", "marker": {"color": "#f03e3e", "size": 8}, "line": {"color": "#ffa8a8", "width": 1, "dash": "dot"} } ], "layout": { "title": "Gradient Descent Path on f(x)", "xaxis": {"title": "x"}, "yaxis": {"title": "f(x)"}, "legend": {"x": 0.01, "y": 0.99}, "margin": {"l": 50, "r": 20, "t": 40, "b": 40}, "width": 600, "height": 400 } }The plot shows the quadratic function $f(x)$ and the path taken by gradient descent starting from $x=0$ with a learning rate of $0.1$. Each marker represents the value of $x$ at an iteration, moving progressively towards the minimum at $x=2$. The code snippet above manually includes sample values for the path; you would replace these with the x_history values returned by the gradient_descent function and calculate the corresponding y values using f(x).Experimenting with ParametersTry changing the initial_x, learning_rate (alpha), and iterations. Observe how these changes affect the convergence:Different initial_x: The algorithm should still converge to the same minimum for this convex function.Larger learning_rate (e.g., 0.8): Might converge faster, but if it's too large (e.g., 1.1 for this function), the algorithm might overshoot the minimum and diverge (the $x$ values will get larger and larger).Smaller learning_rate (e.g., 0.01): Convergence will be slower, requiring more iterations to get close to the minimum.Fewer iterations: The algorithm might stop before reaching the minimum.This simple implementation demonstrates the core concept. In machine learning, $f(x)$ represents the cost function $J(\theta)$, $x$ represents the entire set of model parameters $\theta$, and $\nabla f(x)$ represents the gradient vector $\nabla J(\theta)$. The gradient calculation often involves techniques like backpropagation (covered in the next chapter), but the fundamental update step remains the same: adjust parameters in the direction opposite to the gradient to minimize the cost.