While the biological neuron provides a fascinating starting point, building computational models requires a more precise, mathematical abstraction. Let's break down the standard model of an artificial neuron, often called a unit or node within a network. This model captures the essence of the biological neuron's signal processing in a simplified, computationally tractable form.
Think of an artificial neuron as a processing unit that receives multiple inputs, performs a calculation, and produces a single output. Here are the fundamental components:
An artificial neuron receives one or more input signals. These inputs, denoted as x1,x2,...,xn, represent the features or information fed into the neuron. For example, in an image classification task, these inputs could be the pixel values of a small patch of the image.
Each input connection has an associated weight, denoted as w1,w2,...,wn. These weights are crucial parameters that the network learns during the training process. A weight signifies the importance or strength of its corresponding input signal. A large positive weight means the input strongly excites the neuron, while a large negative weight means the input strongly inhibits it. A weight close to zero indicates the input has little effect on the neuron's output.
The first step inside the neuron is to compute a weighted sum of all its inputs. This aggregates the influence of all input signals, modulated by their respective weights. Mathematically, this is often represented as:
z=(w1x1+w2x2+⋯+wnxn)+bThis is essentially a linear combination of the inputs.
Notice the additional term b in the summation function. This is the bias term. You can think of the bias as a way to shift the activation function's trigger point, making it easier or harder for the neuron to activate (produce a non-zero output). Alternatively, it can be viewed as a weight associated with a constant input of 1 (x0=1, w0=b). Without a bias, the neuron's weighted sum w1x1+⋯+wnxn would always pass through the origin, limiting its flexibility. The bias allows the neuron to model relationships that don't necessarily pass through the origin. Like the weights, the bias is a learnable parameter adjusted during training.
The result of the summation, z, is then passed through an activation function, often denoted by g(⋅). This function introduces non-linearity into the neuron's output.
output=a=g(z)=g(i=1∑nwixi+b)Why is non-linearity important? If neurons only performed linear transformations (like the weighted sum), stacking multiple layers of neurons would still only result in a linear transformation overall. This would severely limit the complexity of the functions the network could learn. Biological neurons exhibit non-linear firing behavior, and activation functions mimic this. Common examples include the Sigmoid, Tanh, and ReLU functions, which we will explore in detail in the next chapter. For now, understand that the activation function determines the final output signal passed on to other neurons or used as the final network output.
We can visualize this mathematical model as follows:
An illustration of the artificial neuron model. Inputs (xi) are multiplied by weights (wi), summed together with a bias (b) to produce z, which is then passed through an activation function g(⋅) to yield the final output a.
This mathematical model, simple yet powerful, forms the basis of almost all neural networks. The Perceptron, which we'll discuss next, is essentially this model with a specific type of activation function (a step function). By connecting many such neurons together in layers, we can build complex networks capable of learning intricate patterns in data.
© 2025 ApX Machine Learning