Mixtral of Experts, Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed, 2024arXiv preprint arXiv:2401.04088DOI: 10.48550/arXiv.2401.04088 - Introduces the Mixtral 8x7B model, a sparsely activated Mixture of Experts model, providing crucial context for the 'Target MoE Model' mentioned in speculative decoding, and demonstrating the performance of such large-scale MoE architectures.
Speculative Decoding: Faster LLM Inference with Small Models, Alex M. Dai, Shibo Wang, 2023 (Google AI Blog) - A Google AI blog post that provides an accessible explanation of speculative decoding, including its principles and benefits, serving as a valuable high-level overview for understanding the technique.