Apache Spark Documentation, The Apache Software Foundation, 2024 - Covers the core concepts and architecture of Apache Spark, essential for understanding distributed data processing frameworks for large-scale embedding generation.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ćukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture, which forms the basis for many modern, computationally intensive embedding models used in large-scale RAG systems.
NVIDIA Triton Inference Server Documentation, NVIDIA Corporation, 2024 - Details how to deploy, scale, and manage AI models for high-performance inference, covering operational aspects crucial for efficient GPU-accelerated embedding generation.