Load Balancing and Auxiliary Losses

Was this section helpful?

References

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, William Fedus, Barret Zoph, Noam Shazeer, 2022 Journal of Machine Learning Research (JMLR), Vol. 23 DOI: 10.5555/3545946.3546001 - This influential work describes the Switch Transformer architecture, demonstrating the practical application and continued importance of auxiliary load balancing losses in large-scale Mixture of Experts models.