Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.48550/arXiv.1706.03762 - 介绍Transformer架构和自注意力机制的原始论文,这是KV缓存的基础。
Attention Mechanisms and Transformers, Aston Zhang, Zack C. Lipton, Mu Li, Alex Smola, 2023 (Cambridge University Press) - 一本开源深度学习教材的章节,以教育形式清晰地解释了Transformer架构、自注意力机制和相关概念。