Think of your computer's memory like a workspace. While the Central Processing Unit (CPU) is the worker doing calculations, it needs a place to temporarily hold the instructions and data it's actively working on. This primary workspace is called Random Access Memory (RAM), often referred to simply as system memory.
RAM is a type of electronic storage that is much faster than long-term storage devices like Solid State Drives (SSDs) or Hard Disk Drives (HDDs). When you launch a program or open a file, the necessary data is loaded from your slower storage drive into the much faster RAM so the CPU can access it quickly. This speed is essential for a smooth computing experience.
RAM serves several important functions:
The key characteristic of RAM is that it's volatile. This means it only holds data while the computer has power. When you turn off your computer, everything stored in RAM disappears. That's why you need to save your work to a persistent storage device like an SSD or HDD.
Basic relationship between CPU, RAM, GPU, VRAM, and Storage. Data moves from persistent storage to RAM for general processing, and often into VRAM for GPU-intensive tasks like running LLMs.
RAM is measured in Gigabytes (GB). Common amounts in modern computers range from 8GB to 16GB for basic tasks, 32GB for more demanding use, and 64GB or more for high-performance workstations.
How much RAM do you need for working with Large Language Models? While the LLM parameters themselves are often loaded into the specialized VRAM of a GPU (which we'll cover next), the system RAM still plays a supporting role.
For typical LLM inference (using a model, not training it), the amount of system RAM is usually less restrictive than the amount of VRAM. However, having insufficient RAM (e.g., trying to run demanding software on a system with only 4GB or 8GB) can still cause slowdowns or prevent applications from running correctly, regardless of the AI task.
Besides capacity (how much data it can hold), RAM also has speed (how fast data can be transferred) and latency (the delay before a transfer begins). These are measured in Megahertz (MHz) or Megatransfers per second (MT/s) and timings (like CL). While faster RAM can improve overall system responsiveness, for running pre-trained LLMs, the capacity of both RAM and especially VRAM is generally more impactful than minor differences in RAM speed.
In summary, system RAM is the computer's main workspace, essential for running the operating system, applications, and handling active data. While not typically the primary bottleneck for holding the LLM parameters themselves (that's usually VRAM), sufficient RAM is necessary for the overall system stability and performance required to work with AI models effectively. Next, we will look at the specialized hardware that excels at the parallel computations needed for LLMs: the GPU and its dedicated memory, VRAM.
© 2025 ApX Machine Learning