GLaM: Efficient Scaling of Language Models with MoE, Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu, Zhifeng Chen, Claire Cui, 2022International Conference on Machine Learning (ICML) 2022DOI: 10.48550/arXiv.2112.06905 - 提出了一种大规模MoE语言模型,讨论了其在推理过程中的效率的实际考虑,包括大规模内存和计算方面的因素。