When browsing for Large Language Models (LLMs), one of the first things you'll notice is labels like "7B", "13B", or even "70B". This refers to the model's size, specifically the number of parameters it contains. Understanding what this means is fundamental to selecting a model that will work effectively on your hardware.
Think of an LLM as an incredibly complex network, somewhat analogous to the connections between neurons in a brain. During its training phase, the LLM learns by adjusting the strengths of these connections. Each adjustable connection strength, or weight, is a parameter. These parameters store the "knowledge" the model has learned from the vast amounts of text data it was trained on. They determine how the model responds to your prompts, predicting the next word (or token, as we learned in Chapter 1) based on the input it receives.
A model with more parameters has, in essence, more capacity to store intricate patterns, nuances of language, and diverse information learned during training. Imagine a very complex machine with millions or billions of tiny adjustable knobs (the parameters). Training sets these knobs to the right positions to produce coherent and relevant text.
Model size is typically measured by the total count of these parameters. The numbers you see, like 7B, 13B, or 70B, are shorthand:
Larger numbers indicate more parameters and, consequently, a "larger" model. While not the only factor, the parameter count is a primary indicator of a model's potential complexity and resource needs.
The number of parameters directly influences several practical aspects of running an LLM locally:
This is often the most immediate constraint for local use. Every parameter in the model needs to be loaded into your computer's memory (RAM) or your graphics card's memory (VRAM) to be used.
A general illustration of how memory requirements increase with model parameter count. Actual needs vary based on model format and quantization.
Inference is the process of using the trained model to generate text based on your prompt. Calculating the output involves computations across all those billions of parameters.
While not a perfect correlation, larger models generally exhibit more sophisticated language understanding and generation capabilities.
Choosing a model size involves balancing desired capabilities against your available hardware resources and tolerance for generation speed.
For your first steps into local LLMs, starting with a smaller model (like a 7B variant) is often recommended. It allows you to get things running, understand the workflow, and gauge performance on your system without immediately hitting hardware limitations. As you become more familiar, you can experiment with larger models if your hardware permits. The concept of quantization, discussed next, provides a way to make even larger models more accessible.
© 2025 ApX Machine Learning