Computer Architecture: A Quantitative Approach, John L. Hennessy and David A. Patterson, 2017 (Elsevier) - Provides detailed understanding of memory hierarchy, caches, and performance bottlenecks, which is fundamental for interpreting memory access patterns.
NVIDIA CUDA C++ Programming Guide, NVIDIA Corporation, 2023 (NVIDIA Corporation) - Official guide for CUDA programming, detailing memory hierarchy, access patterns, and optimization techniques relevant to GPUs and Nsight profiler usage.
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 201813th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (USENIX Association) - Discusses how deep learning compilers like TVM apply transformations (e.g., operator fusion, tiling, data layout) that directly impact memory access patterns and overall performance.
Intel VTune Profiler User Guide, Intel Corporation, 2023 (Intel Corporation) - Official documentation for a leading CPU profiler, explaining how to identify cache misses, memory bandwidth issues, and NUMA effects on CPU architectures.