Okay, let's build on the idea that Large Language Models process and generate language. How do they actually learn to do this? The answer lies in something called parameters.
Think of an LLM as an incredibly complex network, loosely inspired by the connections in a brain. This network is made up of layers of interconnected "nodes" or "neurons." When you train an LLM, you feed it vast amounts of text data. The training process adjusts the "strength" or "importance" of the connections between these neurons. These adjustable connection strengths are the parameters.
In the context of machine learning and specifically LLMs, parameters are variables internal to the model whose values are learned from data during the training process. They essentially define the skill of the model on a particular task, like understanding grammar, facts, reasoning abilities, and different styles of text.
You can imagine parameters as millions (or billions!) of tiny dials within the model. Each dial's setting is adjusted during training based on the examples the model sees. The final settings of all these dials represent the "knowledge" the model has acquired.
Common types of parameters in neural networks include:
For an LLM, the parameters collectively capture the patterns, structures, and nuances of the language it was trained on.
The total number of these learnable parameters (weights and biases combined) is the standard way we measure the "size" of an LLM. Why?
So, when you hear about a model having "7 billion parameters" (often written as 7B), it means the model has 7,000,000,000 adjustable weights and biases that were learned during its training. This number gives us a rough but useful estimate of the model's potential capabilities and, importantly for this course, its hardware requirements.
A highly simplified view of connections (edges) between neurons (nodes) in different layers. The labels 'w' represent weights (parameters), indicating the strength of each connection. Biases (also parameters) are associated with neurons in the hidden and output layers. The total count of all 'w' values and biases gives the parameter count.
Understanding parameters is fundamental because this count is the primary factor determining how much memory and computational power an LLM requires, which we will explore in the upcoming chapters.
© 2025 ApX Machine Learning