The encoder is the first critical stage in an autoencoder's journey of learning from data. Its primary responsibility is to take the input data, which might be high-dimensional and complex, and transform it into a more compact, lower-dimensional representation. Think of it as an information distiller, tasked with capturing the most salient features of the input while discarding noise or redundancy. This compressed representation is often called the latent space representation or code.
Typically, an encoder is constructed using a sequence of neural network layers. For standard autoencoders dealing with flat or vector data (like what we've discussed from Chapter 1), these are often fully-connected (or dense) layers. The defining characteristic of these layers within the encoder is that they progressively reduce the dimensionality. For example, if your input data has 784 features, the first hidden layer in the encoder might have 256 neurons, the next 128, and so on, until the final layer of the encoder (the bottleneck) outputs the desired lower-dimensional code.
The diagram above illustrates a typical encoder structure, where data flows from a higher-dimensional input through layers with decreasing numbers of neurons, culminating in a lower-dimensional latent representation.
Each layer in the encoder performs a transformation on its input. This usually involves a linear operation (multiplying the input by a weight matrix and adding a bias vector) followed by a non-linear activation function. Mathematically, for a single layer in the encoder, the output h given an input x′ (which could be the original input x or the output of a previous encoder layer) can be represented as: h=σ(Wx′+b) Here, W is the weight matrix, b is the bias vector, and σ is the activation function. Common choices for activation functions in encoder layers include:
The non-linear activation functions are important; without them, a stack of linear layers would just be equivalent to a single linear transformation, severely limiting the complexity of the functions the encoder can learn.
The "magic" of the encoder lies in how it learns the weights W and biases b for each of its layers. During the autoencoder's training process, the entire network, including the encoder, is optimized to minimize the reconstruction error (as discussed with the reconstruction loss function in the chapter introduction). This means the encoder isn't just randomly squashing data. Instead, it learns to perform a compression that retains the information most critical for the decoder to reconstruct the original input. The encoder, therefore, learns a meaningful way to map the input x to the latent representation z: z=encoder(x) This learned latent vector z is the compressed essence of the input, which we'll explore further when we discuss the bottleneck layer and how these features can be extracted and used. The quality of this compression is paramount, as it directly influences how well the original data can be reconstructed and, more importantly for our course, how useful these learned features are for other machine learning tasks.
Was this section helpful?
© 2025 ApX Machine Learning