At the heart of deep learning lies the concept of the artificial neuron, a mathematical function loosely inspired by biological neurons. Think of it as the basic processing unit within a larger network. An artificial neuron receives one or more inputs, performs a computation, and produces an output.
Let's break down its components:
- Inputs (x1,x2,...,xn): These are the numerical values fed into the neuron. They could be pixel values from an image, numerical features from a dataset, or outputs from neurons in a previous layer.
- Weights (w1,w2,...,wn): Each input xi is associated with a weight wi. The weight determines the influence or strength of the connection for that specific input. Inputs with larger weights will have a greater impact on the neuron's output. These weights are the primary parameters the network learns during training.
- Bias (b): This is an additional parameter added to the weighted sum of inputs. The bias allows the neuron to shift its activation function output, providing more flexibility in the model. It helps the model fit data even when all inputs are zero. Like weights, the bias is learned during training.
- Summation: The neuron computes a weighted sum of its inputs plus the bias. Mathematically, this is often represented as:
z=(w1x1+w2x2+...+wnxn)+b=i=1∑n(wixi)+b
- Activation Function (f): The result of the summation, z, is then passed through a non-linear activation function f. This function introduces non-linearity into the model, which is essential for learning complex patterns that cannot be captured by simple linear transformations. Without non-linear activation functions, a deep neural network would behave like a single linear layer, regardless of its depth. Common examples include ReLU (Rectified Linear Unit), Sigmoid, and Tanh (hyperbolic tangent). The final output of the neuron is a=f(z).
Here is a simple visualization of a single artificial neuron:
A single artificial neuron computing a weighted sum of its inputs, adding a bias, and applying an activation function.
Neurons rarely work in isolation. Their power comes from being organized into layers to form an Artificial Neural Network (ANN). A typical feedforward network, the simplest kind, has a distinct structure:
- Input Layer: This is not technically a layer of neurons but rather represents the raw input data fed into the network. The number of "nodes" in this layer corresponds to the number of features in each data sample. For example, if you are classifying 28x28 pixel grayscale images, the input layer would have 28×28=784 nodes.
- Hidden Layers: These layers sit between the input and output layers. Each layer consists of multiple neurons. The term "hidden" signifies that their outputs are not directly observed; they represent intermediate processing stages where the network learns increasingly complex representations or features from the input data. A network can have zero, one, or many hidden layers. Networks with multiple hidden layers are what give "deep" learning its name.
- Output Layer: This final layer produces the network's prediction. The number of neurons and the activation function used in the output layer depend on the type of task:
- For binary classification (e.g., cat vs. dog), you might use a single neuron with a Sigmoid activation function to output a probability between 0 and 1.
- For multi-class classification (e.g., digit recognition 0-9), you might use multiple neurons (one for each class) with a Softmax activation function to output a probability distribution across the classes.
- For regression (predicting a continuous value), you might use a single neuron with no activation function (or a linear activation).
In many standard architectures, especially for tabular data, layers are often fully connected (also called Dense layers in Keras). This means that every neuron in one layer receives input from every neuron in the preceding layer.
Consider a simple network with one input layer (3 features), one hidden layer (4 neurons), and one output layer (2 neurons):
A simple fully connected feedforward neural network. Information flows from the input layer, through the hidden layer, to the output layer. Each line represents a weighted connection.
Information flows through this network in a forward direction, from input to output. Each layer performs its computations based on the outputs of the previous layer. This process is called the forward pass. The arrangement of these neurons and layers defines the network's architecture. Keras provides convenient ways, which we'll explore in the next chapter, to define these architectures layer by layer. Understanding this basic structure of neurons and layers is fundamental before you start building models.