Theoretical Limits of Compression and Acceleration
New · Open Source
Kerb - LLM Development Toolkit
Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Computer Architecture: A Quantitative Approach, John L. Hennessy, David A. Patterson, 2017 (Morgan Kaufmann) - A classic textbook presenting a detailed understanding of computer architecture principles, including discussions on hardware limitations, memory hierarchy, and Amdahl's Law.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré, 2022Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2205.14135 - Presents an optimized attention mechanism that significantly enhances the speed and memory efficiency of Transformers by addressing memory I/O challenges, illustrating how algorithm improvements can advance performance.