Scaling Laws for Neural Language Models, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei, 2020arXiv preprint arXiv:2001.08361DOI: 10.48550/arXiv.2001.08361 - 提出了语言模型性能与模型大小、数据集大小和计算量相关的经验性缩放定律。
Emergent Abilities of Large Language Models, Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, David Dohan, Sharan Narang, Aakanksha Chowdhery, Dennis Anil, Aitor Lewkowycz, Erica Grant, Adam Roberts, Kevin Robinson, Brennan Saeta, Hyung Won Chung, Azalia Mirhoseini, Charles Sutton, Siva Reddy, P. J. Liu, William Fedus, Xiangru Tang, Michele Catasta, Xavier Garcia, Dan Garrette, Kevin Lacker, Srinivas Ramabhadran, Peter J. Liu, Adam Roberts, Jonathon Shlens, Noam Shazeer, Maithra Raghu, Jordan Hoffmann, Henryk Michalewski, Jeffrey Dean, 2022Transactions on Machine Learning Research, Vol. 1 (JMLR.org)DOI: 10.48550/arXiv.2206.07682 - 定义并阐述了大型语言模型中随规模增大而定性出现的涌现能力。
Training Compute-Optimal Large Language Models, Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre, 2022arXiv preprint arXiv:2203.15556DOI: 10.48550/arXiv.2203.15556 - 研究了在给定计算预算下,模型大小和训练数据之间的最佳平衡。
Language Models are Few-Shot Learners, Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020Advances in Neural Information Processing Systems, Vol. 33 (NeurIPS) - 通过扩展规模,展示了大型语言模型进行少样本学习和指令遵循的能力。