条件计算原理

全新 · 开源

用于构建生产级 LLM 应用的 Python 工具包。提供提示词、RAG、智能体、结构化输出和多提供商支持等模块化实用工具。

这部分内容有帮助吗？

参考文献

Adaptive Mixture of Local Experts, Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton, 1991 Neural Computation, Vol. 3 (MIT Press) DOI: 10.1162/neco.1991.3.1.79 - 介绍了混合专家架构的基本概念，其中不同的“专家”专门处理输入空间的不同区域，并由一个门控网络管理。
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean, 2017 arXiv preprint arXiv:1701.06538 DOI: 10.48550/arXiv.1701.06538 - 提出了稀疏门控混合专家层，这是在保持每个样本计算成本不变的情况下，将神经网络扩展到庞大参数数量的关键创新，特别是在Transformer模型中。
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Dmitry Lepikhin, Hieu Pham, Anselm Levskaya, Jonathan Shlens, Artem Grygoryev, Xingguang Chen, Yanqi Zhou, Yuanzhong Xu, Vikram Muralidharan, George Tucker, Anirudh Gowthaman, Chandraekhar Sowrirajan, David So, Jeffrey Dean, 2021 International Conference on Learning Representations (ICLR) 2021 DOI: 10.48550/arXiv.2006.16668 - 详细介绍了一种自动分片和训练巨型条件计算模型（包括稀疏MoE）的系统，能够在数千个加速器上进行，从而实现万亿参数的模型。