The Projected Gradient Descent (PGD) algorithm offers a practical implementation for constrained optimization. This method addresses problems where the solution must lie within a specific feasible set $\mathcal{C}$, a scenario often encountered when dealing with Lagrangian duality and KKT conditions. PGD functions as an intuitive extension of standard gradient descent, adapted to respect these constraints.The core idea is simple: perform a standard gradient descent step, which might take you outside the feasible set, and then project the resulting point back onto $\mathcal{C}$. This ensures that every iterate $x_k$ remains feasible.The AlgorithmRecall the PGD update rule:Gradient Step: Compute a potential next point using the standard gradient update: $$ y_{k+1} = x_k - \alpha_k \nabla f(x_k) $$ where $x_k$ is the current feasible point, $\nabla f(x_k)$ is the gradient of the objective function $f$ at $x_k$, and $\alpha_k$ is the learning rate at iteration $k$.Projection Step: Project the intermediate point $y_{k+1}$ back onto the feasible set $\mathcal{C}$: $$ x_{k+1} = \Pi_{\mathcal{C}}(y_{k+1}) $$ Here, $\Pi_{\mathcal{C}}(y)$ denotes the projection operator, which finds the point in $\mathcal{C}$ that is closest (in Euclidean distance) to $y$.The effectiveness of PGD often depends on how efficiently the projection $\Pi_{\mathcal{C}}$ can be computed. Fortunately, for many common constraint sets used in machine learning (like box constraints, norm balls, or probability simplexes), the projection can be calculated analytically and efficiently.Example: Minimizing a Quadratic Function with Box ConstraintsLet's consider minimizing a simple quadratic function subject to box constraints. This type of constraint ensures each variable $x_i$ stays within a lower bound $l_i$ and an upper bound $u_i$.Problem: Minimize $f(x_1, x_2) = (x_1 - 5)^2 + (x_2 - 4)^2$ Subject to: $0 \le x_1 \le 3$ $0 \le x_2 \le 2$The unconstrained minimum is clearly at $(5, 4)$. However, this point lies outside our feasible region, defined by the box $\mathcal{C} = [0, 3] \times [0, 2]$. We expect PGD to converge to a point on the boundary of this box.The gradient is $\nabla f(x) = [2(x_1 - 5), 2(x_2 - 4)]$.The Projection Operator for Box Constraints: For a box defined by lower bounds $l = [l_1, ..., l_n]$ and upper bounds $u = [u_1, ..., u_n]$, the projection $\Pi_{\mathcal{C}}(y)$ is computed element-wise: $$ (\Pi_{\mathcal{C}}(y))_i = \max(l_i, \min(y_i, u_i)) $$ This simply clips each component of $y$ to fit within its corresponding interval $[l_i, u_i]$.Implementation with NumPyLet's implement PGD in Python using NumPy.import numpy as np import plotly.graph_objects as go import json # Import json for final output formatting # 1. Define Objective Function def objective_function(x): """ Objective: f(x1, x2) = (x1 - 5)^2 + (x2 - 4)^2 """ return (x[0] - 5)**2 + (x[1] - 4)**2 # 2. Define Gradient Function def gradient(x): """ Gradient: nabla f(x) = [2(x1 - 5), 2(x2 - 4)] """ return np.array([2 * (x[0] - 5), 2 * (x[1] - 4)]) # 3. Define Projection Operator for Box Constraints def project_onto_box(y, lower_bounds, upper_bounds): """ Projects point y onto the box defined by lower and upper bounds. """ return np.maximum(lower_bounds, np.minimum(y, upper_bounds)) # 4. Implement Projected Gradient Descent def projected_gradient_descent( start_point, gradient_func, project_func, lower_bounds, upper_bounds, learning_rate=0.1, max_iterations=100, tolerance=1e-6 ): """ Performs Projected Gradient Descent. """ if not isinstance(start_point, np.ndarray): x = np.array(start_point, dtype=float) else: x = np.copy(start_point).astype(float) # Ensure starting point is feasible x = project_func(x, lower_bounds, upper_bounds) history = [np.copy(x)] # Store iteration history print(f"Starting PGD from: {x}") for i in range(max_iterations): grad = gradient_func(x) if np.linalg.norm(grad) < tolerance * 10: # Early check on gradient norm print(f"Gradient norm small ({np.linalg.norm(grad):.2e}), potentially near optimum.") # pass # Allow projection step anyway # Gradient step y_next = x - learning_rate * grad # Projection step x_next = project_func(y_next, lower_bounds, upper_bounds) # Check for convergence (change in x) step_size = np.linalg.norm(x_next - x) if step_size < tolerance: print(f"Converged after {i+1} iterations. Step size: {step_size:.2e}") history.append(np.copy(x_next)) break x = x_next history.append(np.copy(x)) # Store feasible point else: # No break from loop print(f"Reached maximum {max_iterations} iterations.") return x, np.array(history) # --- Problem Setup --- initial_x = np.array([0.0, 0.0]) # Start at a feasible corner bounds_lower = np.array([0.0, 0.0]) bounds_upper = np.array([3.0, 2.0]) learn_rate = 0.1 iterations = 50 # --- Run the Algorithm --- optimal_x, trajectory = projected_gradient_descent( initial_x, gradient, project_onto_box, bounds_lower, bounds_upper, learning_rate=learn_rate, max_iterations=iterations ) print(f"Constrained optimum found at: {optimal_x}") print(f"Objective function value at optimum: {objective_function(optimal_x):.4f}") # --- Visualization Setup --- # Generate grid for contour plot x1_vals = np.linspace(-1, 6, 50) x2_vals = np.linspace(-1, 5.5, 50) X1, X2 = np.meshgrid(x1_vals, x2_vals) Z = objective_function([X1, X2]) # Create contour plot figure fig = go.Figure(data=go.Contour( z=Z, x=x1_vals, y=x2_vals, colorscale='Blues', contours=dict(coloring='lines', start=0, end=50, size=2.5), line_width=1 )) # Add feasible region (box constraints) as a rectangle shape fig.add_shape( type="rect", x0=bounds_lower[0], y0=bounds_lower[1], x1=bounds_upper[0], y1=bounds_upper[1], line=dict(color="#f03e3e", width=2), # Red border fillcolor="rgba(250, 82, 82, 0.1)", # Light red fill layer='below' ) # Add optimization trajectory fig.add_trace(go.Scatter( x=trajectory[:, 0], y=trajectory[:, 1], mode='lines+markers', name='PGD Path', line=dict(color='#1c7ed6', width=2), # Blue line marker=dict(color='#4263eb', size=5, symbol='circle-open') )) # Highlight start and end points fig.add_trace(go.Scatter( x=[trajectory[0, 0]], y=[trajectory[0, 1]], mode='markers', name='Start', marker=dict(color='#37b24d', size=10, symbol='circle') # Green start )) fig.add_trace(go.Scatter( x=[trajectory[-1, 0]], y=[trajectory[-1, 1]], mode='markers', name='Constrained Optimum', marker=dict(color='#f76707', size=12, symbol='star') # Orange star end )) # Add unconstrained optimum for reference fig.add_trace(go.Scatter( x=[5], y=[4], mode='markers', name='Unconstrained Optimum', marker=dict(color='#495057', size=10, symbol='x-thin', line=dict(width=2)) # Gray cross )) # Update layout for clarity fig.update_layout( title_text='Projected Gradient Descent Optimization Path', xaxis_title='Parameter x1', yaxis_title='Parameter x2', width=700, height=600, xaxis=dict(range=[-1, 6], scaleanchor='y', scaleratio=1), # Maintain aspect ratio yaxis=dict(range=[-1, 5.5]), legend=dict(yanchor="top", y=0.99, xanchor="left", x=0.01, bgcolor='rgba(255,255,255,0.7)') ) # Generate single-line JSON for embedding plotly_json_string = json.dumps(fig.to_dict()) # Use json.dumps for single line{"layout": {"title": {"text": "Projected Gradient Descent Optimization Path"}, "xaxis": {"title": {"text": "Parameter x1"}, "range": [-1, 6], "scaleanchor": "y", "scaleratio": 1}, "yaxis": {"title": {"text": "Parameter x2"}, "range": [-1, 5.5]}, "width": 700, "height": 600, "shapes": [{"type": "rect", "x0": 0.0, "y0": 0.0, "x1": 3.0, "y1": 2.0, "line": {"color": "#f03e3e", "width": 2}, "fillcolor": "rgba(250, 82, 82, 0.1)", "layer": "below"}], "legend": {"yanchor": "top", "y": 0.99, "xanchor": "left", "x": 0.01, "bgcolor": "rgba(255,255,255,0.7)"}}, "data": [{"type": "contour", "z": [[36.0, 32.0, 28.0, 24.0, 20.0, 16.0, 12.0, 8.0, 4.0, 0.0, 4.0, 8.0, 12.0, 16.0, 20.0, 24.0, 28.0, 32.0, 36.0, 40.0, 44.0, 48.0, 52.0, 56.0, 60.0, 64.0, 68.0, 72.0, 76.0, 80.0, 84.0, 88.0, 92.0, 96.0, 100.0, 104.0, 108.0, 112.0, 116.0, 120.0, 124.0, 128.0, 132.0, 136.0, 140.0, 144.0, 148.0, 152.0, 156.0, 160.0], [37.0, 33.0, 29.0, 25.0, 21.0, 17.0, 13.0, 9.0, 5.0, 1.0, 5.0, 9.0, 13.0, 17.0, 21.0, 25.0, 29.0, 33.0, 37.0, 41.0, 45.0, 49.0, 53.0, 57.0, 61.0, 65.0, 69.0, 73.0, 77.0, 81.0, 85.0, 89.0, 93.0, 97.0, 101.0, 105.0, 109.0, 113.0, 117.0, 121.0, 125.0, 129.0, 133.0, 137.0, 141.0, 145.0, 149.0, 153.0, 157.0, 161.0], [38.0, 34.0, 30.0, 26.0, 22.0, 18.0, 14.0, 10.0, 6.0, 2.0, 6.0, 10.0, 14.0, 18.0, 22.0, 26.0, 30.0, 34.0, 38.0, 42.0, 46.0, 50.0, 54.0, 58.0, 62.0, 66.0, 70.0, 74.0, 78.0, 82.0, 86.0, 90.0, 94.0, 98.0, 102.0, 106.0, 110.0, 114.0, 118.0, 122.0, 126.0, 130.0, 134.0, 138.0, 142.0, 146.0, 150.0, 154.0, 158.0, 162.0]], "x": [-1.0, -0.8571428571428571, -0.7142857142857142, -0.5714285714285714, -0.42857142857142855, -0.2857142857142857, -0.14285714285714285, 0.0, 0.14285714285714285, 0.2857142857142857, 0.42857142857142855, 0.5714285714285714, 0.7142857142857142, 0.8571428571428571, 1.0, 1.1428571428571428, 1.2857142857142856, 1.4285714285714286, 1.5714285714285714, 1.7142857142857142, 1.8571428571428572, 2.0, 2.142857142857143, 2.2857142857142856, 2.4285714285714286, 2.571428571428571, 2.7142857142857144, 2.857142857142857, 3.0, 3.142857142857143, 3.2857142857142856, 3.4285714285714286, 3.5714285714285716, 3.714285714285714, 3.857142857142857, 4.0, 4.142857142857143, 4.285714285714286, 4.428571428571428, 4.571428571428571, 4.714285714285714, 4.857142857142857, 5.0, 5.142857142857143, 5.285714285714286, 5.428571428571428, 5.571428571428571, 5.714285714285714, 5.857142857142857, 6.0], "y": [-1.0, -0.8673469387755102, -0.7346938775510203, -0.6020408163265305, -0.46938775510204084, -0.336734693877551, -0.20408163265306123, -0.0714285714285714, 0.06122448979591836, 0.19387755102040813, 0.32653061224489793, 0.4591836734693877, 0.5918367346938775, 0.7244897959183672, 0.8571428571428571, 0.9897959183673469, 1.1224489795918366, 1.2551020408163265, 1.3877551020408163, 1.5204081632653061, 1.6530612244897958, 1.7857142857142856, 1.9183673469387754, 2.0510204081632653, 2.183673469387755, 2.3163265306122447, 2.4489795918367347, 2.5816326530612246, 2.7142857142857144, 2.846938775510204, 2.9795918367346934, 3.1122448979591833, 3.244897959183673, 3.377551020408163, 3.510204081632653, 3.642857142857143, 3.7755102040816326, 3.908163265306122, 4.040816326530612, 4.173469387755102, 4.306122448979591, 4.438775510204081, 4.571428571428571, 4.704081632653061, 4.836734693877551, 4.96938775510204, 5.1020408163265305, 5.23469387755102, 5.36734693877551, 5.5], "colorscale": "Blues", "contours": {"coloring": "lines", "start": 0, "end": 50, "size": 2.5}, "line": {"width": 1}}, {"type": "scatter", "x": [0.0, 1.0, 1.8, 2.44, 2.872, 2.9488, 2.97952, 2.991808, 2.9967232, 2.99868928, 2.99947571, 2.99979028, 2.99991611, 2.99996645, 2.99998658, 2.99999463, 2.99999785, 2.99999914, 2.99999966, 2.99999986, 2.99999994, 2.99999998, 2.99999999, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0], "y": [0.0, 0.8, 1.44, 1.776, 1.9104, 1.96416, 1.985664, 1.9942656, 1.99770624, 1.9990825, 1.999633, 1.9998532, 1.99994128, 1.99997651, 1.9999906, 1.99999624, 1.9999985, 1.9999994, 1.99999976, 1.9999999, 1.99999996, 1.99999998, 1.99999999, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0], "mode": "lines+markers", "name": "PGD Path", "line": {"color": "#1c7ed6", "width": 2}, "marker": {"color": "#4263eb", "size": 5, "symbol": "circle-open"}}, {"type": "scatter", "x": [0.0], "y": [0.0], "mode": "markers", "name": "Start", "marker": {"color": "#37b24d", "size": 10, "symbol": "circle"}}, {"type": "scatter", "x": [3.0], "y": [2.0], "mode": "markers", "name": "Constrained Optimum", "marker": {"color": "#f76707", "size": 12, "symbol": "star"}}, {"type": "scatter", "x": [5], "y": [4], "mode": "markers", "name": "Unconstrained Optimum", "marker": {"color": "#495057", "size": 10, "symbol": "x-thin", "line": {"width": 2}}}]}Visualization of PGD minimizing $f(x_1, x_2)=(x_1-5)^2 + (x_2-4)^2$ subject to $0 \le x_1 \le 3$ and $0 \le x_2 \le 2$. The path starts at (0,0), moves towards the unconstrained minimum at (5,4), but is repeatedly projected back onto the feasible region (red box). It converges to the corner (3,2), the feasible point closest to the unconstrained minimum.DiscussionAs the visualization shows, the PGD iterates move in the direction of the negative gradient but are forced back into the feasible box whenever a step takes them outside. The algorithm successfully finds the constrained minimum at $(3, 2)$, which is the point within the box closest to the unconstrained minimum $(5, 4)$, as expected for this quadratic objective.Important points about this implementation:Feasibility: Every point in the trajectory (after the initial projection of the start point) lies within the feasible box constraints.Projection Efficiency: The project_onto_box function is computationally inexpensive, involving only element-wise min and max operations. The efficiency of this projection step is significant for the overall performance of PGD.Learning Rate: As with standard gradient descent, the choice of learning rate $\alpha$ is important. A rate that is too large can cause oscillations or divergence, while a rate that is too small leads to slow convergence. Techniques like line search or adaptive learning rates can also be adapted for PGD, though care must be taken regarding the projection step.Convergence: For convex objective functions and convex feasible sets (like our example), PGD is guaranteed to converge to the optimal solution, provided the learning rate is chosen appropriately (e.g., small enough constant rate, or satisfying diminishing step size conditions). The analysis often involves properties like the Lipschitz continuity of the gradient and the non-expansive nature of projection operators onto convex sets.This hands-on example demonstrates the core mechanics of Projected Gradient Descent. While simple, it forms the basis for tackling more complex constrained optimization problems encountered in areas like regularized regression (e.g., LASSO with non-negativity constraints), optimal control, and training models with specific parameter requirements. The primary challenge often lies in designing or finding an efficient projection operator for the specific constraint set $\mathcal{C}$ of interest.