Linear Transformation: Weighted Sum Calculation

As information begins its path through the network during forward propagation, the first significant computation within each neuron is a linear transformation. This step involves calculating a weighted sum of all the inputs connected to that neuron, plus a bias term specific to the neuron. Think of it as combining the incoming signals based on their learned importance (weights) and then shifting the result by a certain amount (bias).

Calculating the Weighted Sum for a Single Neuron

Recall from Chapter 1 that a single artificial neuron processes multiple inputs. For a neuron receiving $n$ inputs $x_1, x_2, \dots, x_n$ , each input $x_i$ is associated with a corresponding weight $w_i$ . The neuron also has a bias term, $b$ . The linear transformation combines these elements to produce an intermediate value, often denoted by $z$ .

The calculation is performed as follows:

z = (w_1 x_1 + w_2 x_2 + \dots + w_n x_n) + b

This can be expressed more compactly using summation notation:

z = \sum_{i=1}^{n} (w_i x_i) + b

Let's break down the components:

Inputs ( $x_i$ ): These are the feature values from the input data (for the first hidden layer) or the outputs (activations) from the neurons in the previous layer.
Weights ( $w_i$ ): These parameters represent the strength or importance of the connection associated with each input $x_i$ . During training, the network learns optimal values for these weights. A larger positive weight means the corresponding input strongly excites the neuron, while a large negative weight means it strongly inhibits the neuron. A weight close to zero means the input has little influence.
Bias ( $b$ ): This is an additional parameter associated with the neuron itself, independent of the inputs. It acts like an offset, shifting the result of the weighted sum. The bias allows the neuron to adjust the output independently of the input values, effectively shifting the activation function's curve left or right, making it easier for the network to learn patterns.
Output ( $z$ ): This is the result of the linear transformation. It's the raw, aggregated signal before it gets passed through the neuron's activation function. This value $z$ is sometimes referred to as the pre-activation, logit, or simply the weighted sum.

Vectorized Notation

Performing these sums individually can be computationally intensive, especially in networks with many neurons and inputs. Linear algebra provides a much more efficient way to represent and compute this. If we organize the weights into a vector $\mathbf{w}$ and the inputs into a vector $\mathbf{x}$ :

\mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}

Then the weighted sum (excluding the bias for a moment) is simply the dot product of the weight vector and the input vector. Depending on whether you treat $\mathbf{w}$ and $\mathbf{x}$ as column vectors (common in deep learning literature), the dot product is often written using transpose notation:

\mathbf{w}^T \mathbf{x} = \begin{bmatrix} w_1 & w_2 & \dots & w_n \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = w_1 x_1 + w_2 x_2 + \dots + w_n x_n

Including the bias, the full linear transformation for a single neuron becomes:

z = \mathbf{w}^T \mathbf{x} + b

This vector notation is fundamental because it scales efficiently when we have multiple neurons in a layer, which we'll see involves matrix multiplications.

Example Calculation

Let's examine a neuron with 3 inputs and specific weights and bias:

Inputs: $\mathbf{x} = [2.0, 3.0, -1.0]$
Weights: $\mathbf{w} = [0.5, -1.2, 0.8]$
Bias: $b = 0.1$

The weighted sum $z$ is calculated as:

$z = (w_1 x_1 + w_2 x_2 + w_3 x_3) + b$ $z = (0.5 \times 2.0) + (-1.2 \times 3.0) + (0.8 \times -1.0) + 0.1$ $z = (1.0) + (-3.6) + (-0.8) + 0.1$ $z = 1.0 - 3.6 - 0.8 + 0.1$ $z = -3.3$

A simple illustration of the weighted sum calculation for one neuron. Inputs are multiplied by their respective weights, summed together, and then the bias is added to produce the pre-activation value $z$ .

Across an Entire Layer

This linear transformation happens concurrently for every neuron within a given layer. Critically, each neuron in a layer receives the exact same input vector $\mathbf{x}$ (coming from the previous layer or the initial dataset). However, each neuron $j$ in the layer has its own unique weight vector $\mathbf{w}_j$ and its own unique bias $b_j$ .

Therefore, if a layer has $m$ neurons, it will compute $m$ different weighted sums, $z_1, z_2, \dots, z_m$ :

Neuron 1: $z_1 = \mathbf{w}_1^T \mathbf{x} + b_1$
Neuron 2: $z_2 = \mathbf{w}_2^T \mathbf{x} + b_2$
...
Neuron $m$ : $z_m = \mathbf{w}_m^T \mathbf{x} + b_m$

Each $z_j$ represents the aggregated input signal for neuron $j$ before non-linearity is introduced. This collection of $z$ values forms the input for the next step: applying the activation function across the layer. We'll explore how to perform these layer-wide calculations efficiently using matrix operations in a later section. For now, understanding this fundamental weighted sum calculation is the important first step in the forward pass.

Was this section helpful?

References

Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook offering a thorough mathematical and theoretical introduction to neural networks, including detailed explanations of forward propagation, linear transformations, and vectorized operations.
Neural Networks and Deep Learning, Michael A. Nielsen, 2015 (Determination Press) - An accessible online book that clearly explains the fundamental concepts of neural networks, including the mechanics of weighted sums and bias in a neuron. Ideal for beginners.
CS231n: Convolutional Neural Networks for Visual Recognition, Lecture Notes, Stanford University, CS231n Course Staff, 2023 - Provides excellent introductory material on neural networks, covering forward pass computation, linear layers, and the use of vector and matrix operations for efficiency.