Masterclass
Training large language models demands substantial computational resources. The scale of these models, often involving billions or trillions of parameters, strains the capabilities of conventional hardware. This chapter concentrates on the hardware systems that make LLM training feasible.
We will examine the specific characteristics of Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) designed for accelerating deep learning computations. You will learn about memory requirements, particularly High Bandwidth Memory (HBM), and the role of high-speed interconnects such as NVLink and InfiniBand in distributed training setups. We will also analyze the trade-offs involved in hardware selection, considering cost, performance, and accessibility. A solid grasp of these hardware considerations is necessary for planning and managing large-scale model training.
18.1 GPU Architectures (NVIDIA Ampere, Hopper)
18.2 TPU Architectures (Google TPUs)
18.3 Memory Requirements (HBM, GPU RAM)
18.4 Interconnect Technologies (NVLink, InfiniBand)
18.5 Hardware Selection Trade-offs (Cost, Performance, Availability)
© 2025 ApX Machine Learning