Standard transfer learning, involving fine-tuning a pre-trained model, typically assumes a reasonable amount of labeled data for the target task. However, in many practical scenarios, acquiring large labeled datasets is infeasible due to cost, time, or the inherent rarity of certain categories. Imagine needing to identify a newly discovered species of bird from only a handful of photographs or adapting a medical imaging system to recognize a rare condition with just a few patient examples. This is where Few-Shot Learning (FSL) becomes essential.
FSL tackles the problem of learning to recognize new classes given only a very small number of labeled examples, often just one or five per class. Formally, this is often framed as an N-way K-shot classification problem: the model is given K labeled examples (the "support set") for each of N new classes it hasn't seen during initial training, and its goal is to correctly classify new, unlabeled examples (the "query set") belonging to one of these N classes. When K=1, it's called one-shot learning.
Directly fine-tuning a large CNN like ResNet or EfficientNet with only K examples per class often leads to severe overfitting. The model's high capacity allows it to simply memorize the few support examples without learning generalizable features for the new classes. Therefore, specialized techniques are required. FSL methods generally fall into a few categories, often building upon strong feature representations learned via pre-training on a larger, related dataset (like ImageNet).
The core idea behind metric learning for FSL is to learn an embedding function that maps images into a feature space where images from the same class are close together, and images from different classes are far apart. Classification can then be performed by comparing the embedding of a query image to the embeddings of the support examples.
Prototypical Networks offer an intuitive and effective metric learning approach. During training and testing, they operate on "episodes" designed to mimic the few-shot scenario.
The episodic training forces the embedding function fϕ to produce representations that generalize well to new classes, as it must consistently form clusters and separate prototypes for classes it hasn't been specifically trained on before each episode.
Another foundational metric learning technique involves Siamese Networks. These networks process pairs of images using identical CNNs (sharing weights ϕ). The network outputs embeddings for both images, and a distance function (like Euclidean distance or cosine similarity) compares these embeddings. The network is trained to minimize the distance for pairs of images from the same class and maximize it for pairs from different classes, often using a contrastive loss or triplet loss function. For few-shot classification, a query image's embedding can be compared against the embeddings of all support images, and classification is typically done based on the nearest support example's class.
Instead of learning a fixed embedding space, optimization-based methods focus on learning an algorithm or model initialization that can quickly adapt to new tasks using only a few examples. This is often referred to as "learning to learn" or meta-learning.
MAML is a popular and versatile meta-learning algorithm. Its goal is to find a set of initial model parameters θ such that adapting these parameters to a new task requires only a few gradient steps using the task's small support set, leading to good performance on that task's query set.
The process involves two optimization loops:
This outer loop update involves differentiating through the inner loop's gradient update, often requiring second-order derivatives (though first-order approximations are common). MAML aims to find an initialization θ that is positioned strategically in the parameter space, making it highly sensitive and adaptable to various few-shot tasks drawn from the task distribution p(task).
Many FSL methods, particularly metric-learning approaches like Prototypical Networks, rely heavily on episodic training. This strategy directly simulates the few-shot problem during the training phase.
The episodic training process samples small N-way K-shot tasks (episodes) from a larger base dataset. The model is trained to perform well on the query examples within each episode, based only on the support examples provided for that episode.
In each training iteration:
By repeatedly training on these diverse, randomly generated few-shot tasks, the model learns representations or adaptation strategies that are effective even for entirely new classes it encounters during meta-testing, provided they come from a similar distribution.
Few-shot learning is intrinsically linked to transfer learning. Most successful FSL approaches do not train the entire CNN from scratch using only the episodic procedure. Instead, they often use a CNN backbone pre-trained on a large dataset (like ImageNet) as a powerful feature extractor. The FSL technique (metric learning, MAML, etc.) is then applied primarily to the final layers or operates within the feature space produced by this pre-trained backbone. The pre-training provides a strong foundation of general visual features, which the FSL method then adapts or utilizes specifically for the task of discriminating between new classes based on few examples.
Few-shot learning provides a powerful set of tools for adapting CNNs in data-scarce environments, extending the reach of computer vision beyond applications where massive labeled datasets are readily available. It represents a sophisticated form of model adaptation, pushing beyond standard fine-tuning to enable learning under significant data constraints.
© 2025 ApX Machine Learning