FlashAttention: Faster and More Efficient Attention for Transformers, Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré, 2022arXiv (arXiv)DOI: 2205.14135 - An accessible explanation by the original authors that details the practical implications and underlying hardware principles of FlashAttention's memory and speed optimizations.