Compiler Passes for Quantization-Aware Training (QAT)
Was this section helpful?
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko, 2018Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.48550/arXiv.1712.05877 - This paper introduces the principles of Quantization-Aware Training (QAT), including fake quantization and its role in achieving efficient integer-only inference for deep neural networks.
MLIR Quantization Dialect, The MLIR Authors, 2024 (LLVM Project) - Official documentation describing the Intermediate Representation (IR) constructs and operations for representing quantized computations within the MLIR compiler framework.
Quantization for PyTorch Models, The PyTorch Authors, 2019 (PyTorch Foundation) - Official guide to PyTorch's quantization capabilities, including details on Quantization-Aware Training (QAT) implementation and usage within the framework.