Post Training Quantization (PTQ) and Quantization Aware Training (QAT), PyTorch Contributors, 2019 (PyTorch Foundation) - Official documentation for PyTorch's quantization API, providing practical guides and examples for implementing Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).
A Survey of Quantization Methods for Efficient Neural Network Inference, Amir Gholami, Song Han, Sheng Shen, Kaiyuan Yang, Shangyu Sun, Lu Hou, Zhuang Liu, Sehoon Kim, Bichen Wu, Matthew Yao, Michael W. Mahoney, Kurt Keutzer, 2021arXiv preprint arXiv:2103.01533DOI: 10.48550/arXiv.2103.01533 - A comprehensive academic survey detailing various quantization methods, their theoretical foundations, and practical considerations for efficient neural network inference.
NVIDIA Deep Learning Performance Guide, NVIDIA Corporation, 2023 (NVIDIA Corporation) - An official guide providing best practices for optimizing deep learning model performance on NVIDIA GPUs, including strategies for mixed-precision training and INT8 inference with Tensor Cores.