Meta-learning fundamentally shifts the learning objective from mastering a single task to acquiring the ability to learn new tasks rapidly and efficiently. Unlike traditional supervised learning where a model is trained on a large dataset for one specific job, meta-learning operates on a distribution of tasks. The objective is to extract transferable knowledge or learning strategies that accelerate adaptation when faced with a novel task, particularly when data for that new task is scarce. This is often framed as "learning to learn."
At the core of the meta-learning problem lies the concept of a task. A task represents a specific learning problem, such as classifying a new set of images, translating between a novel pair of languages, or adapting a language model to a unique writing style. In the meta-learning framework, we assume tasks are drawn from an underlying probability distribution . This distribution defines the universe of problems the meta-learning algorithm is expected to handle.
The meta-learning process typically involves two phases:
Crucially, each individual task within the meta-training or meta-testing set is itself structured as a small learning problem. It comprises two distinct subsets of data:
"This division into support and query sets within each task is fundamental. It simulates the few-shot scenario during meta-training: the model must learn from the support set how to perform well on the query set for that specific task."
Data flow within a single step of the meta-training phase. A task is sampled, split into support () and query () sets. The meta-model parameters () are adapted using to yield task-specific parameters (). Performance is then evaluated on , and the resulting loss informs the update to the meta-parameters .
Let represent the parameters of our meta-learned model or the parameters defining our learning procedure (e.g., initial weights of a neural network, parameters of an optimizer). The process of adapting these general parameters to task-specific parameters using the support set can be denoted by a function or algorithm . So, .
The ultimate goal of meta-training is to find the optimal meta-parameters that minimize the expected loss on the query sets across the distribution of tasks , after adaptation using the corresponding support sets . If represents the loss (e.g., cross-entropy, mean squared error) of the adapted model on the query set , the meta-objective can be formally stated as:
In practice, this expectation is approximated by averaging the query set loss over a batch of tasks sampled during each meta-training iteration.
This formulation directly applies when working with large foundation models. Here, represents the potentially massive set of parameters of the foundation model (e.g., a Transformer). The Adapt function could be:
Regardless of the specific Adapt mechanism, the meta-learning goal remains consistent: find initial parameters (or a way to generate them) such that the model performs well on the query set after seeing only the small support set . The challenge lies in efficiently performing the meta-optimization (finding ) and the adaptation () given the enormous scale of in foundation models, a central theme explored throughout this course. Understanding this core problem structure is essential before examining specific algorithms designed to solve it.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with