Prerequisites: Python, ML, LLM basics.
Level:
Advanced Quantization Techniques
Implement and compare various LLM quantization methods including low-bit (sub-4-bit), mixed-precision, and post-training quantization algorithms like GPTQ and AWQ.
Quantization Calibration
Apply advanced calibration techniques to minimize accuracy loss during LLM quantization.
Performance Analysis
Evaluate the performance (latency, throughput, memory usage) and accuracy trade-offs of quantized LLMs.
Hardware-Specific Optimization
Optimize quantized LLM inference for different hardware targets, including CPUs and GPUs.
Deployment Frameworks
Utilize specialized frameworks and libraries (e.g., TensorRT-LLM, vLLM, TGI, ONNX Runtime) for deploying quantized LLMs efficiently.
Deployment Strategies
Implement deployment strategies for serving quantized LLMs, considering scaling and resource management.
© 2025 ApX Machine Learning