Neural networks draw significant inspiration from the way biological brains process information. While the ultimate goal isn't to perfectly replicate the brain, understanding the basic computational unit of the brain, the neuron, provides a useful conceptual starting point for building artificial learning systems.
A biological neuron is a specialized cell designed for information processing and transmission. At a high level, it operates as follows:
A simplified view of information flow in a biological neuron.
This biological process of receiving weighted inputs, integrating them, and producing an output based on a threshold provides the core analogy for the artificial neuron.
An artificial neuron, sometimes called a perceptron in its earliest form, is a mathematical function conceived as a simplified model of a biological neuron. It takes multiple input signals, processes them, and produces a single output signal. Here are its key components:
Inputs (x1,x2,...,xn): These are the numerical values fed into the neuron. They could be pixel values from an image, features from a dataset (like age or income), or outputs from neurons in a previous layer.
Weights (w1,w2,...,wn): Each input xi is associated with a weight wi. The weight represents the strength or importance of that specific input connection. A larger absolute weight means that the corresponding input has a greater influence on the neuron's output. These weights are the primary parameters that the network learns during training.
Bias (b): This is an additional, learnable parameter associated with the neuron itself, not connected to any specific input. The bias acts like an offset, making it easier or harder for the neuron to activate (produce a non-zero output). Think of it like adjusting the activation threshold. Without a bias, the neuron's weighted sum must pass through the origin, limiting its flexibility.
Summation Function: The neuron calculates a weighted sum of its inputs, adding the bias. This is often represented as z:
z=(w1x1+w2x2+⋯+wnxn)+b=(i=1∑nwixi)+bThis calculation, z, represents the linear combination of the inputs.
Activation Function (f): The result of the summation, z, is then passed through an activation function, f. This function introduces non-linearity into the model, allowing neural networks to learn complex relationships in data that simple linear models cannot. The final output of the neuron, often denoted as a, is:
a=f(z)=f((i=1∑nwixi)+b)We will explore specific activation functions like Sigmoid, Tanh, and ReLU in detail later in this chapter. For now, understand its role is to transform the linear sum z into the neuron's final output a.
Structure of a single artificial neuron. Inputs (xi) are multiplied by weights (wi), summed together with a bias (b) to produce z, which is then passed through an activation function f to generate the output a.
While heavily simplified compared to its biological counterpart, this artificial neuron model captures the essential idea of integrating weighted inputs and producing an output based on the result. It's a fundamental computational unit. By connecting many such units together in layers, we can build powerful neural networks capable of learning complex patterns, which is the subject of the rest of this course.
© 2025 ApX Machine Learning