An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2010.11929 - 介绍了视觉Transformer (ViT) 架构,它是将混合专家模型应用于图像数据的基础。
Vision MoE: An Empirical Study of Scaling Laws for MoE in Vision, William Fedus, Jeff Dean, Zhifeng Chen, Yuanzhong Xu, Anna Goldie, Basil Mustafa, Anushan Fernando, George Tucker, Yonghui Wu, David So, Blake Hechtman, Barret Zoph, David R. So, Aditya Sharma, Hieu Pham, Quoc V. Le, Paul Barham, Daniel N. Freeman, Albin Cassirer, Jiantao Jiao, Shibo Wang, Claire Cui, Ewa Dominowska, H. Yang, A. Mirhoseini, 2022International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.2203.05605 - 研究了混合专家模型在视觉Transformer中的应用,详细说明了大型视觉模型的扩展性与性能。