Chapter 1: Principles of Small Language Models

Small Language Models (SLMs) provide an alternative to massive multi-billion parameter networks. They run locally and operate within strict hardware limits. Before writing code to fine-tune a model, you need to establish a baseline understanding of how these architectures function and how supervised learning updates their internal weights.

In this chapter, you will look at the exact mechanics of small language models. We start by defining what qualifies as an SLM and comparing fine-tuning directly against alternatives like Retrieval-Augmented Generation (RAG).

You will examine the mathematics of supervised learning. Let's take for example the fundamental weight update rule in gradient descent, represented as:

$w_{t+1} = w_t - \alpha \nabla L(w_t)$

Here, $w_t$ represents the model weights at a specific step, $\alpha$ is the learning rate, and $\nabla L(w_t)$ is the gradient of the loss function. Understanding these mechanics clarifies exactly what happens when you feed custom data into a pre-trained network.

We will also calculate specific hardware requirements. You will learn how to estimate the Video RAM (VRAM) needed to load and train these models without running into out-of-memory errors. Finally, you will write a Python script to load a pre-trained SLM into memory and perform basic text generation.

Sections

1.1 What is a Small Language Model
1.2 Fine-Tuning vs Retrieval-Augmented Generation
1.3 Supervised Fine-Tuning Mechanics
1.4 Hardware Requirements and Memory Constraints
1.5 Hands-On Practical: Initializing a Pre-Trained SLM