安全地整合新数据源

全新 · 开源

用于构建生产级 LLM 应用的 Python 工具包。提供提示词、RAG、智能体、结构化输出和多提供商支持等模块化实用工具。

这部分内容有帮助吗？

参考文献

The Pile: An 800GB Dataset of Diverse Text for Language Modeling, Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, Connor Leahy, 2020 arXiv preprint arXiv:2101.00027 DOI: 10.48550/arXiv.2101.00027 - 描述了用于构建大型语言模型预训练数据集的广泛数据整理、过滤和去重流程。
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜, Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret, 2021 FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery) DOI: 10.1145/3442188.3445922 - 提出了关于大型语言模型偏见、伦理影响和环境成本的担忧，特别关注数据来源和社会影响。