CUDA C++ Programming Guide, NVIDIA, 2024 (NVIDIA) - Explains GPU architecture, memory hierarchy, and parallel computation principles fundamental to LLM execution.
Deep Learning Systems: Algorithms, Compilers, and Hardware for AI, Guoliang Wei, Bo Zhang, Yuhao Chen, 2024 (Chapman and Hall/CRC) - This book provides a comprehensive understanding of the systems aspects of deep learning, including hardware design and optimization for inference.