Having examined Large Language Models, their size in parameters, and the associated hardware components like GPUs and VRAM, we now turn to how these models are actually utilized. There are two fundamental modes of operation: using a pre-existing model to generate outputs (inference) and the process of creating or adapting a model (training).
These distinct activities impose substantially different loads on the hardware resources discussed earlier. Recognizing the difference between inference and training is key to understanding why certain hardware configurations are suitable for simply running an LLM, while others are necessary for development or fine-tuning.
This chapter covers the following points:
By comparing the resource needs of these two processes, we can better contextualize the hardware estimation techniques discussed later.
4.1 What is Model Inference?
4.2 Hardware Needs for Inference
4.3 What is Model Training?
4.4 Hardware Needs for Training
4.5 Focus on Inference Requirements
© 2025 ApX Machine Learning