Let's put the theory into practice. One of the most direct applications of solving linear systems in machine learning is determining the optimal parameters (coefficients) for models like linear regression. We want to find the best fit line or hyperplane through our data points.
Consider a simple linear regression scenario. We have a set of data points (xi,yi), where xi represents the features and yi represents the target value. For a model with multiple features, xi would be a vector. Our goal is to find a set of coefficients, let's call them θ (often represented by x in the Ax=b form), such that the model's predictions, y^=Xθ, are as close as possible to the actual target values y.
The standard approach in linear regression involves minimizing the sum of squared errors between predictions and actual values. This minimization problem leads directly to a system of linear equations known as the Normal Equations:
XTXθ=XTy
Here:
Notice that this equation has the familiar form Ax=b, where A=XTX, x=θ, and b=XTy. Assuming the matrix A=XTX is invertible, we can solve for θ using the matrix inverse:
θ=(XTX)−1XTy
Let's work through an example using NumPy. Suppose we have some data representing, for instance, the square footage of houses and their corresponding prices. We want to find the linear model price = intercept + coefficient * square_footage
that best fits this data.
First, let's define our sample data. We'll use two features for a slightly more complex example (e.g., square footage and number of bedrooms) to predict the price.
import numpy as np
# Sample data: [Square Footage, Number of Bedrooms]
X_features = np.array([
[1500, 3],
[2000, 4],
[1200, 2],
[1800, 3],
[2500, 4]
])
# Corresponding house prices (in $1000s)
y_target = np.array([300, 450, 250, 400, 550])
# Add a column of ones to X_features for the intercept term
# This creates our design matrix X
X_design = np.hstack([np.ones((X_features.shape[0], 1)), X_features])
print("Design Matrix X (with intercept column):\n", X_design)
print("\nTarget Vector y:\n", y_target)
Now, we apply the Normal Equation formula θ=(XTX)−1XTy.
# Calculate X transpose X (XTX)
XTX = X_design.T @ X_design # Using the @ operator for matrix multiplication
# Calculate X transpose y (XTy)
XTy = X_design.T @ y_target
print("\nX^T * X:\n", XTX)
print("\nX^T * y:\n", XTy)
# Calculate the inverse of XTX
XTX_inv = np.linalg.inv(XTX)
print("\nInverse of (X^T * X):\n", XTX_inv)
# Calculate the coefficients theta
theta = XTX_inv @ XTy
print("\nCalculated Coefficients (theta) using Inverse:\n", theta)
print(f"Intercept: {theta[0]:.2f}")
print(f"Coefficient for Sq Footage: {theta[1]:.2f}")
print(f"Coefficient for Bedrooms: {theta[2]:.2f}")
The resulting theta
vector contains the intercept and the coefficients for each feature (square footage and number of bedrooms).
np.linalg.solve()
While calculating the inverse works, it's often less numerically stable and computationally more expensive than directly solving the system Ax=b. NumPy provides the np.linalg.solve(A, b)
function, which solves the system without explicitly computing the inverse. It's generally the preferred method.
Let's solve (XTX)θ=XTy using np.linalg.solve()
. Here, A=XTX and b=XTy.
# Solve the system (XTX) * theta = XTy directly
theta_solve = np.linalg.solve(XTX, XTy)
print("\nCalculated Coefficients (theta) using np.linalg.solve():\n", theta_solve)
print(f"Intercept: {theta_solve[0]:.2f}")
print(f"Coefficient for Sq Footage: {theta_solve[1]:.2f}")
print(f"Coefficient for Bedrooms: {theta_solve[2]:.2f}")
You should see that the coefficients obtained using np.linalg.solve()
are virtually identical to those calculated using the explicit inverse method. However, for larger or more complex systems, np.linalg.solve()
offers better performance and numerical accuracy.
Let's simplify to one feature (e.g., square footage) to visualize the result.
import numpy as np
# Use only square footage
X_features_simple = np.array([1500, 2000, 1200, 1800, 2500])[:, np.newaxis] # Ensure it's a column vector
y_target_simple = np.array([300, 450, 250, 400, 550])
# Add intercept column
X_design_simple = np.hstack([np.ones((X_features_simple.shape[0], 1)), X_features_simple])
# Solve using np.linalg.solve
XTX_simple = X_design_simple.T @ X_design_simple
XTy_simple = X_design_simple.T @ y_target_simple
theta_simple = np.linalg.solve(XTX_simple, XTy_simple)
print("\nCoefficients for Simple Model (Intercept, Sq Footage Coeff):\n", theta_simple)
# Generate points for the regression line
x_line = np.linspace(1100, 2600, 100) # Range covering our data
y_line = theta_simple[0] + theta_simple[1] * x_line
# Create Plotly chart data
plot_data = {
"layout": {
"title": "Linear Regression Fit (Price vs. Square Footage)",
"xaxis": {"title": "Square Footage"},
"yaxis": {"title": "Price ($1000s)"},
"hovermode": "closest",
"template": "plotly_white" # Cleaner look for web
},
"data": [
{
"type": "scatter",
"mode": "markers",
"x": X_features_simple.flatten().tolist(),
"y": y_target_simple.tolist(),
"name": "Data Points",
"marker": {"color": "#228be6", "size": 10} # blue
},
{
"type": "scatter",
"mode": "lines",
"x": x_line.tolist(),
"y": y_line.tolist(),
"name": "Regression Line",
"line": {"color": "#f03e3e", "width": 3} # red
}
]
}
Linear regression line fitted to sample data using coefficients derived from solving the Normal Equations.
This hands-on section demonstrates how solving systems of linear equations is a practical tool for finding parameters in machine learning models. By representing the problem in matrix form (Ax=b, or more specifically, (XTX)θ=XTy), we can leverage efficient numerical libraries like NumPy to find the optimal coefficients, either through matrix inversion or, preferably, using direct solvers like np.linalg.solve()
.
© 2025 ApX Machine Learning