CUDA C++ Programming Guide, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Official documentation for CUDA memory management concepts, including host-device memory, pinned memory, and Unified Memory.
TensorFlow: A System for Large-Scale Machine Learning, MartÃn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Goldie Neema, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Brian Norris, Serge Novik, Jonathon Shlens, Krithika Suresh, Kevin Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Rajat Puri, George Siblings, Parker Singleton, Rohan Wandere, Alexander Wicke, 201612th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16) (ACM)DOI: 10.1145/2987508.2987529 - Describes the architecture of TensorFlow, highlighting its BFCAllocator as an example of memory pooling and reuse strategies in ML runtimes.
Memory management, PyTorch Contributors, 2024 (PyTorch Foundation) - Official PyTorch documentation detailing its CUDA memory allocator, including caching, memory pooling, and strategies for handling GPU memory.