While linear methods like Principal Component Analysis (PCA) are effective for capturing variance and reducing dimensionality when data exhibits primarily linear correlations, they often fall short when dealing with the complexities inherent in many real-world datasets. As discussed regarding the limitations of PCA, these methods fundamentally assume that the data lies on or close to a linear subspace within the high-dimensional space. This assumption frequently breaks down.
Much high-dimensional data, such as images, audio signals, or text embeddings, is better described by the manifold hypothesis. This hypothesis suggests that the data points, despite residing in a very high-dimensional ambient space (like the space of all possible pixel values for an image), actually lie close to a lower-dimensional, non-linear manifold embedded within that space.
Imagine a rolled-up sheet of paper (a "Swiss Roll") in 3D space. The intrinsic dimensionality of the paper's surface is 2D, but it exists within a 3D environment. A linear method like PCA, seeking to find the directions of maximum variance, might simply project the roll onto a 2D plane, effectively squashing it and losing the underlying structure. Points that were far apart along the roll's surface might appear close together in the PCA projection.
A simplified 3D representation of data points lying on a non-linear manifold, resembling a 'Swiss Roll'. Linear methods struggle to capture the underlying structure.
Applying PCA to the 'Swiss Roll' data projects points onto a 2D plane. Notice how the inherent structure is lost, and points that were far apart on the roll may become close in the projection.
The goal of representation learning isn't just dimensionality reduction; it's about finding features or factors that effectively capture the underlying structure and variations in the data. Consider image data again. The identity of an object (e.g., a 'cat') remains constant under various transformations like changes in lighting, pose, scale, or translation. These transformations are often highly non-linear in the pixel space.
This is where neural networks, the building blocks of autoencoders, come into play. The power of deep learning models stems largely from their ability to learn complex, non-linear functions. This capability arises from two main components:
Techniques like t-SNE and UMAP, introduced earlier, are excellent for visualizing these non-linear structures by creating low-dimensional embeddings that preserve local relationships. However, they don't typically provide an explicit encoder function that maps new data points into the embedding space, nor do they learn features suitable for reconstruction or generative tasks.
Therefore, to learn representations that capture the intricate structure of complex data for tasks beyond visualization (like compression, denoising, generation, or transfer learning), we require methods capable of learning powerful non-linear feature extractors. Autoencoders provide a flexible and effective framework for learning such non-linear mappings directly from data in an unsupervised or self-supervised fashion, forming the core subject of the subsequent chapters.
© 2025 ApX Machine Learning