Representing Equations in Matrix Form (Ax = b)

A system of linear equations is a collection of relationships between several unknown quantities. For instance, you might have two unknowns, $x_1$ and $x_2$ , linked by two equations:

2x_1 + 3x_2 = 8

4x_1 - x_2 = 2

Solving this system means finding a value for $x_1$ and a value for $x_2$ that make both equations true at the same time. While you may have solved small systems like this by hand using substitution or elimination, this approach does not scale well. Imagine a system with hundreds of equations and variables, a common scenario in machine learning. We need a more systematic and computationally friendly method.

This is where linear algebra provides a powerful way to organize the problem. We can rewrite any system of linear equations into the compact and elegant form $Ax = b$ .

Breaking Down the System

Let's look at our example again and separate it into its three main components:

Coefficients: These are the numbers that multiply the variables (like 2, 3, 4, and -1).
Variables: These are the unknowns we want to solve for ( $x_1$ and $x_2$ ).
Constants: These are the outcomes or the values on the right-hand side of the equals sign (8 and 2).

The core idea is to group each of these component types into its own structure: the coefficients into a matrix $A$ , the variables into a vector $x$ , and the constants into a vector $b$ .

Assembling the Matrix and Vectors

1. The Coefficient Matrix A

We create a matrix $A$ by arranging the coefficients in the same layout as they appear in the equations. Each row in the matrix corresponds to an equation, and each column corresponds to a variable. For our system, the coefficient matrix $A$ is:

A = \begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix}

The first row [2 3] contains the coefficients from the first equation ( $2x_1 + 3x_2 = 8$ ). The second row [4 -1] contains the coefficients from the second equation ( $4x_1 - x_2 = 2$ ).

2. The Variable Vector x

Next, we group our unknown variables into a column vector $x$ . The order must match the order of the columns in matrix $A$ . Since our first column in $A$ corresponds to $x_1$ and the second to $x_2$ , our vector $x$ is:

x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}

3. The Constant Vector b

Finally, we collect the constants from the right-hand side of the equations into another column vector, $b$ :

b = \begin{bmatrix} 8 \\ 2 \end{bmatrix}

The following diagram shows how the system of equations is translated into these three distinct parts.

The components of a system of equations are organized into a coefficient matrix A, a variable vector x, and a constant vector b.

Verifying the Matrix Equation

Now we have our matrix equation:

\begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 8 \\ 2 \end{bmatrix}

But how do we know this is the same as our original system? We can verify it by performing the matrix-vector multiplication on the left side, as we learned in Chapter 3. Remember, we calculate the dot product of each row of the matrix $A$ with the column vector $x$ :

First row: $(2 \cdot x_1) + (3 \cdot x_2)$
Second row: $(4 \cdot x_1) + (-1 \cdot x_2)$

This multiplication results in a new vector:

\begin{bmatrix} 2x_1 + 3x_2 \\ 4x_1 - x_2 \end{bmatrix}

And since we state that $Ax = b$ , we are saying:

\begin{bmatrix} 2x_1 + 3x_2 \\ 4x_1 - x_2 \end{bmatrix} = \begin{bmatrix} 8 \\ 2 \end{bmatrix}

For two vectors to be equal, their corresponding elements must be equal. This gives us back our original two equations: $2x_1 + 3x_2 = 8$ and $4x_1 - x_2 = 2$ . This confirms that the matrix form $Ax = b$ is a perfectly valid and compact representation of our original system.

Why This Representation Is So Useful

Translating a system of equations into the form $Ax = b$ is more than just a notational trick. It provides several significant advantages, especially in the context of computing and machine learning.

Conciseness and Scalability: A system with hundreds of equations and variables can be written down just as simply as $Ax = b$ . The underlying matrix $A$ and vectors $x$ and $b$ would be much larger, but the representation remains clean. This allows us to think about the problem at a higher level of abstraction.
A Path to the Solution: This form suggests a way to solve for $x$ . If $A$ , $x$ , and $b$ were just numbers, you would solve for $x$ by dividing $b$ by $A$ . In linear algebra, the equivalent operation is multiplying by the matrix inverse, which we will explore in the next sections.
Computational Efficiency: Modern numerical computing libraries like NumPy are highly optimized for matrix and vector operations. Representing problems in this form allows us to use fast, pre-built functions to find solutions, rather than writing slow, manual loops. Solving Ax = b is a standard operation in these libraries.

By converting systems of equations into this universal matrix form, we can apply the full power of linear algebra and computation to find solutions efficiently. In the following sections, we will learn about the tools needed to solve for the vector $x$ .

Was this section helpful?

References

Introduction to Linear Algebra, Gilbert Strang, 2016 (Wellesley-Cambridge Press) - Classic foundational text covering systems of linear equations and their matrix representation.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Chapter 2 introduces linear algebra concepts, including Ax=b, with a focus on their relevance for machine learning.
NumPy Reference Guide (numpy.linalg module), The NumPy Developers, 2023 (NumPy) - Official documentation for NumPy's linear algebra module, illustrating how matrix and vector operations are computationally handled.