We've seen that the single-layer Perceptron, while foundational, has a significant limitation: it can only learn linearly separable patterns. Problems like the classic XOR (exclusive OR) function are beyond its capabilities because the classes (0 and 1 outputs) cannot be separated by a single straight line in the input space. To handle such non-linear relationships, we need to introduce more complexity to our network structure. This is where Multi-Layer Perceptrons (MLPs) come in.
The core idea behind MLPs is simple yet effective: insert one or more layers of neurons between the input layer and the output layer. These intermediate layers are called hidden layers. They are "hidden" because their outputs are not directly observed as the final result of the network; instead, they serve as inputs to subsequent layers.
An MLP consists of:
Information in an MLP typically flows in one direction: from the input layer, through the hidden layer(s), to the output layer. This type of architecture is known as a feedforward network. There are no loops or connections that send information backward within the layer structure during the forward pass (prediction phase). Generally, each neuron in a layer is connected to every neuron in the subsequent layer, forming what are called fully connected layers or dense layers.
A simple Multi-Layer Perceptron with one input layer (2 neurons), one hidden layer (3 neurons), and one output layer (1 neuron). Connections indicate the feedforward flow of information.
How do hidden layers solve the problem faced by single-layer perceptrons? The key lies in the combination of multiple layers and non-linear activation functions (which we will examine in detail in the next chapter).
Imagine each neuron in the first hidden layer learns a simple linear boundary, similar to a single perceptron. However, the outputs of these neurons, after being transformed by a non-linear activation function, become the inputs for the next layer. The neurons in the subsequent layers can then combine these non-linear transformations.
Effectively, the hidden layers learn to transform the input data into a new representation space. In this transformed space, the original non-linearly separable problem can become linearly separable. For the XOR problem, a hidden layer can learn representations where the points belonging to different classes can then be separated by a straight line by the output layer neuron. Adding depth (more layers) allows the network to learn increasingly complex functions and hierarchical features from the data.
For an MLP to learn non-linear functions, the activation functions used in the hidden neurons must be non-linear (e.g., Sigmoid, Tanh, ReLU). If only linear activation functions were used throughout the network, the entire MLP, no matter how many layers it had, would mathematically collapse into an equivalent single-layer linear model, bringing us back to the limitations of the Perceptron.
In summary, Multi-Layer Perceptrons overcome the constraints of single-layer models by introducing hidden layers. These layers, combined with non-linear activation functions, enable MLPs to learn complex, non-linear mappings between inputs and outputs, making them powerful tools for a wide range of machine learning tasks. They form the basis for many more advanced deep learning architectures.
© 2025 ApX Machine Learning