MegaBlocks: Efficient Sparse Training with Mixture-of-Experts, Shaden Smith, Brandon Norick, Sam Ade Jacobs, Jonathan Frankle, Jeremy Gray, Elias Frantar, Tal Ben-Nun, Dan Alistarh, 2023International Conference on Machine Learning (ICML), Vol. 202 - 侧重于大型稀疏混合专家模型内存高效且计算优化的训练。