Quantization aware training overview, TensorFlow Authors, 2024 - Official guide for Quantization-Aware Training using the TensorFlow Model Optimization Toolkit.
Quantization and Training of Neural Networks for Efficient On-Device Inference, Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Xu, Matthew Sandler, Andrew Howard, Andrew G. Howard, Hartwig Adam, 2018Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.1109/CVPR.2018.00116 - Paper introducing key concepts and methods for quantizing neural networks for efficient on-device inference, relevant to TensorFlow Lite.
Benchmark TensorFlow Lite models, TensorFlow Authors, 2024 - Official documentation for using the TensorFlow Lite benchmark tool to measure model performance on target devices.