MegaBlocks: Efficient Sparse Training with Mixture-of-Experts, Shaden Smith, Brandon Norick, Sam Ade Jacobs, Jonathan Frankle, Jeremy Gray, Elias Frantar, Tal Ben-Nun, Dan Alistarh, 2023International Conference on Machine Learning (ICML), Vol. 202 - Focuses on memory-efficient and computationally optimized training for large sparse MoE models.