Having established that Multi-Layer Perceptrons (MLPs) use stacked layers to learn complex patterns, let's examine the specific roles these layers play. In a typical feedforward neural network, information flows in one direction, passing sequentially through distinct types of layers: the input layer, one or more hidden layers, and the output layer. Understanding the function of each is fundamental to designing and interpreting network behavior.
Input Layer: The Gateway for Data
The input layer is the network's entry point. Its primary function is to receive the raw input data, which is typically represented as a feature vector. Each neuron, or node, in the input layer usually corresponds to a single feature in your dataset.
- Structure: The number of neurons in the input layer is determined by the dimensionality of the input data. For example, if you are working with tabular data that has 10 features per sample, the input layer will have 10 neurons. If you are processing flattened images of size 28x28 pixels, the input layer would have 28×28=784 neurons.
- Function: The input layer doesn't perform any computation in the traditional sense (no weighted sums or activation functions are applied here). It simply acts as a conduit, passing the feature values from the input sample to the first hidden layer. Think of it as distributing the initial information into the network.
- Data Representation: It's important that the data fed into this layer is numerical and often preprocessed (e.g., scaled or normalized), as subsequent layers will perform mathematical operations on these values.
Hidden Layers: The Computational Core
Between the input and output layers lie the hidden layers. These are the workhorses of the neural network, where the actual learning and feature extraction take place. They are called "hidden" because their outputs are intermediate representations and not directly mapped to the final output of the task.
- Structure: A network can have one or many hidden layers. Networks with multiple hidden layers are considered "deep". The number of neurons in each hidden layer is a design choice, a hyperparameter that influences the model's capacity to learn.
- Function: Each neuron in a hidden layer receives inputs from all neurons in the previous layer (or the input layer). It computes a weighted sum of these inputs, adds a bias term, and then applies a non-linear activation function (like Sigmoid, Tanh, or ReLU, as discussed earlier in this chapter). This non-linearity is essential; without it, stacking multiple layers would be mathematically equivalent to a single layer, limiting the network's ability to model complex, non-linear relationships in the data.
aj(l)=f(∑kwjk(l)ak(l−1)+bj(l))
Here, aj(l) is the activation of the j-th neuron in layer l, f is the activation function, wjk(l) is the weight connecting the k-th neuron in layer l−1 to the j-th neuron in layer l, ak(l−1) is the activation of the k-th neuron in the previous layer (l−1), and bj(l) is the bias for the j-th neuron in layer l.
- Feature Hierarchy: In deep networks, earlier hidden layers tend to learn simple features (like edges or textures in an image), while later hidden layers combine these simple features to learn more complex and abstract representations (like shapes, objects, or concepts).
Output Layer: Delivering the Result
The final layer in the network is the output layer. Its purpose is to produce the network's prediction based on the processed information from the hidden layers.
- Structure: The number of neurons in the output layer depends directly on the specific task the network is designed for:
- Binary Classification: Typically uses a single output neuron with a Sigmoid activation function. The output is a probability value between 0 and 1, indicating the likelihood of belonging to the positive class.
- Multi-class Classification: Uses N output neurons, where N is the number of distinct classes. A Softmax activation function is commonly applied across these neurons. Softmax ensures the outputs are probabilities that sum to 1, representing the probability distribution over the classes.
- Regression: Usually has a single output neuron with a linear activation function (or no activation function). This allows the network to output a continuous numerical value. For multi-target regression, multiple output neurons (one per target value) with linear activations would be used.
- Function: Neurons in the output layer perform the same computation as hidden layer neurons (weighted sum plus bias, followed by an activation function). However, the choice of activation function here is critical as it shapes the final output format to match the problem requirements (e.g., probabilities for classification, continuous values for regression).
Visualizing the Structure
The following diagram illustrates a simple feedforward neural network with one input layer, two hidden layers, and one output layer. Information flows from left to right.
A conceptual representation of a feedforward neural network, showing the flow of information from the input layer, through intermediate hidden layers where computations and non-linear transformations occur, to the output layer that produces the final prediction. Dotted lines represent full connectivity between adjacent layers (connections not individually drawn for clarity).
In summary, the input layer receives data, hidden layers perform non-linear transformations and learn feature representations, and the output layer produces the final prediction in a format suitable for the task. The design of these layers, including the number of layers, the number of neurons per layer, and the activation functions used, constitutes the network's architecture, which we will explore further in the next section.