MLIR: A Compiler Infrastructure for the End of Moore's Law, Chris Lattner, Jacques A. Pienaar, Mehdi Amini, Uday Bondhugula, River Riddle, Albert Cohen, Tatiana Shpeisman, Andy Davis, Nicolas Vasilache, Oleksandr Zinenko, 2020arXiv, Vol. abs/2002.11054DOI: 10.48550/arXiv.2002.11054 - Describes the architecture and principles of MLIR, a compiler infrastructure that supports multi-level IRs for various domains, enabling advanced optimizations like memory-aware layout transformations.
NVIDIA cuDNN Developer Guide, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Official guide detailing best practices for using cuDNN, including discussions on tensor data layouts (NCHW, NHWC) and their impact on GPU performance for deep learning operations.
Data Layout Optimization for Deep Learning Training, Shixin Xu, Minghua Chen, Lei Huang, Shaochen Sun, 2020Proceedings of the VLDB Endowment, Vol. 13 (VLDB Endowment)DOI: 10.14778/3400790.3400827 - Focuses on systematic data layout optimization strategies to improve the performance of deep learning training workloads, addressing the trade-offs of NCHW and NHWC.