While model inference involves using a completed, pre-trained model to perform tasks like generating text or answering questions, model training is the fundamental process where the model learns its capabilities in the first place. Think of inference as taking a final exam, whereas training is the entire period of study, practice, and learning that leads up to it.
Training an LLM, particularly from the beginning (often called pre-training), is an computationally intensive process. It involves exposing the model to enormous datasets, typically containing terabytes of text and code scraped from the internet, books, and other sources.
The core idea is to adjust the model's internal parameters
– the millions or billions of values we discussed in Chapter 1 – so that the model gets progressively better at a specific objective. For most LLMs, this objective is predicting the next word in a sequence of text.
Here’s a simplified view of the process:
This iterative refinement of billions of parameters across massive datasets requires immense computational power. It involves not only storing the parameters (like in inference) but also storing intermediate values needed for the calculations (like gradients and optimizer states) and performing the complex mathematics for the updates.
Furthermore, training often involves processing data in batches and requires significantly more memory (both VRAM and system RAM) and computational throughput (measured in FLOPS, or floating-point operations per second) than inference. This is why training large foundation models typically requires large clusters of powerful GPUs or specialized AI accelerators like TPUs, consuming significant amounts of energy and time.
While pre-training a model from scratch is the most demanding form, another common type is fine-tuning. This involves taking an already pre-trained model and further training it on a smaller, more specific dataset to adapt it for a particular task or domain (like medical text analysis or customer support). Fine-tuning still involves adjusting parameters and requires substantial hardware, but generally less than pre-training from zero.
Understanding this distinction is important: the hardware needed to create or significantly adapt an LLM through training far exceeds the requirements for simply using an existing one for inference. For most users interacting with LLMs, the focus will be on the hardware needed for inference, which we detailed previously and will use for estimation in the next chapter.
© 2025 ApX Machine Learning