Conditional Computation and Mixture-of-Experts (MoE)
New · Open Source
Kerb - LLM Development Toolkit
Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, William Fedus, Barret Zoph, Noam Shazeer, 2022Journal of Machine Learning Research, Vol. 23 (Microtome Publishing) - Presents Switch Transformers, a simplified MoE architecture (k=1 routing) that scales to extremely large models and addresses training stability and load balancing.