As information begins its path through the network during forward propagation, the first significant computation within each neuron is a linear transformation. This step involves calculating a weighted sum of all the inputs connected to that neuron, plus a bias term specific to the neuron. Think of it as combining the incoming signals based on their learned importance (weights) and then shifting the result by a certain amount (bias).
Calculating the Weighted Sum for a Single Neuron
Recall from Chapter 1 that a single artificial neuron processes multiple inputs. For a neuron receiving n inputs x1,x2,…,xn, each input xi is associated with a corresponding weight wi. The neuron also has a bias term, b. The linear transformation combines these elements to produce an intermediate value, often denoted by z.
The calculation is performed as follows:
z=(w1x1+w2x2+⋯+wnxn)+b
This can be expressed more compactly using summation notation:
z=i=1∑n(wixi)+b
Let's break down the components:
Inputs (xi): These are the feature values from the input data (for the first hidden layer) or the outputs (activations) from the neurons in the previous layer.
Weights (wi): These parameters represent the strength or importance of the connection associated with each input xi. During training, the network learns optimal values for these weights. A larger positive weight means the corresponding input strongly excites the neuron, while a large negative weight means it strongly inhibits the neuron. A weight close to zero means the input has little influence.
Bias (b): This is an additional parameter associated with the neuron itself, independent of the inputs. It acts like an offset, shifting the result of the weighted sum. The bias allows the neuron to adjust the output independently of the input values, effectively shifting the activation function's curve left or right, making it easier for the network to learn patterns.
Output (z): This is the result of the linear transformation. It's the raw, aggregated signal before it gets passed through the neuron's activation function. This value z is sometimes referred to as the pre-activation, logit, or simply the weighted sum.
Vectorized Notation
Performing these sums individually can be computationally intensive, especially in networks with many neurons and inputs. Linear algebra provides a much more efficient way to represent and compute this. If we organize the weights into a vector w and the inputs into a vector x:
w=w1w2⋮wn,x=x1x2⋮xn
Then the weighted sum (excluding the bias for a moment) is simply the dot product of the weight vector and the input vector. Depending on whether you treat w and x as column vectors (common in deep learning literature), the dot product is often written using transpose notation:
Including the bias, the full linear transformation for a single neuron becomes:
z=wTx+b
This vector notation is fundamental because it scales efficiently when we consider multiple neurons in a layer, which we'll see involves matrix multiplications.
Example Calculation
Let's consider a neuron with 3 inputs and specific weights and bias:
A simple illustration of the weighted sum calculation for one neuron. Inputs are multiplied by their respective weights, summed together, and then the bias is added to produce the pre-activation value z.
Across an Entire Layer
This linear transformation happens concurrently for every neuron within a given layer. Critically, each neuron in a layer receives the exact same input vectorx (coming from the previous layer or the initial dataset). However, each neuron j in the layer has its own unique weight vector wj and its own unique bias bj.
Therefore, if a layer has m neurons, it will compute m different weighted sums, z1,z2,…,zm:
Neuron 1: z1=w1Tx+b1
Neuron 2: z2=w2Tx+b2
...
Neuron m: zm=wmTx+bm
Each zj represents the aggregated input signal for neuron jbefore non-linearity is introduced. This collection of z values forms the input for the next step: applying the activation function across the layer. We'll explore how to perform these layer-wide calculations efficiently using matrix operations in a later section. For now, understanding this fundamental weighted sum calculation is the important first step in the forward pass.
Was this section helpful?
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook offering a thorough mathematical and theoretical introduction to neural networks, including detailed explanations of forward propagation, linear transformations, and vectorized operations.
Neural Networks and Deep Learning, Michael A. Nielsen, 2015 (Determination Press) - An accessible online book that clearly explains the fundamental concepts of neural networks, including the mechanics of weighted sums and bias in a neuron. Ideal for beginners.