Learning both Weights and Connections for Efficient Neural Networks, Song Han, Jeff Pool, John Tran, William J. Dally, 2015Advances in Neural Information Processing Systems, Vol. 28 (NeurIPS) - Foundational work introducing magnitude-based unstructured pruning and the iterative train-prune-fine-tune approach.
Pruning Filters for Efficient ConvNets, Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf, 2017International Conference on Learning Representations (ICLR) (Curran Associates, Inc.)DOI: 10.48550/arXiv.1608.08710 - Pioneering work on structured pruning by removing entire filters based on their L1 norm, demonstrating practical speedups.
Deep Learning Model Compression: A Comprehensive Survey, Haitao Yang, Wenqi Li, Pengyu Li, Bingqiang Jin, and Haoliang Li, 2020Neurocomputing, Vol. 420 (Elsevier)DOI: 10.1016/j.neucom.2020.07.067 - A comprehensive survey covering various model compression techniques, including network pruning (unstructured and structured), quantization, and knowledge distillation.