Learning both Weights and Connections for Efficient Neural Networks, Song Han, Jeff Pool, John Tran, William J. Dally, 2015Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.1506.02626 - This paper introduced magnitude-based weight pruning, followed by quantization and Huffman coding, demonstrating substantial model size reduction with minimal accuracy loss. It is a fundamental contribution to model compression.
Pruning in PyTorch, Michela Paganini, 2024 (PyTorch) - Official documentation for PyTorch's torch.nn.utils.prune module, offering practical guidance and APIs for applying various pruning techniques.