Learn techniques to reduce the size and computational cost of Large Language Models (LLMs) through quantization. This course provides practical methods for applying quantization, covering popular techniques like Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), common formats such as GPTQ and GGUF, and evaluating performance trade-offs. Gain hands-on experience quantizing LLMs for efficient deployment.
Prerequisites: Familiarity with Python programming, fundamental machine learning concepts, and a basic understanding of Large Language Models (LLMs).
Level: Intermediate
Quantization Principles
Understand the fundamental concepts behind model quantization and its benefits for LLMs.
Post-Training Quantization (PTQ)
Implement various PTQ techniques, including calibration and handling outliers.
Advanced PTQ Methods
Apply advanced PTQ algorithms like GPTQ and understand methods like AWQ.
Quantization-Aware Training (QAT)
Understand the concepts of QAT and how to simulate quantization during training.
Formats and Tooling
Work with common quantization formats (GGUF, GPTQ) and libraries (Hugging Face Optimum, bitsandbytes).
Evaluation and Deployment
Evaluate the performance and accuracy trade-offs of quantized LLMs and understand deployment considerations.
© 2025 ApX Machine Learning