Now that we understand that parameters are the adjustable values an LLM learns during its training, the next logical question is: how do we quantify the "size" of these massive models?
The most common and standard way to measure the size of a Large Language Model is by counting its total number of trainable parameters. Think of these parameters, as discussed previously, like the internal knobs and dials the model adjusts to learn patterns and relationships in the language data it's trained on.
Why use the parameter count? It serves as a direct indicator of the model's potential complexity and capacity. A model with more parameters generally has the potential to store more information, capture finer nuances in language, and potentially perform better on complex tasks compared to a model with fewer parameters. It's a useful, though not perfect, proxy for how capable a model might be.
You'll often hear LLM sizes discussed using a specific unit: billions of parameters. This is frequently abbreviated with a 'B'. For example:
The sheer scale is what puts the "Large" in Large Language Models. These numbers are significantly bigger than those found in many earlier types of machine learning models.
Approximate parameter counts for different LLM size categories. The logarithmic scale on the vertical axis helps visualize the significant differences in magnitude between these sizes.
This measurement, the parameter count, is important for this course because it directly impacts the computational resources needed. As we'll see, a larger number of parameters generally translates to higher requirements for memory (like VRAM in GPUs) and processing power. Knowing the parameter count is the first step in estimating the hardware you might need.
© 2025 ApX Machine Learning