Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Anna Stooke, Erin Cooke, Scott Clark, Allie Schmidt, Aditya Ramesh, Andy Jones, Chris McMahon, Ambrose Slone, Chris Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020Advances in Neural Information Processing Systems, Vol. 33DOI: 10.55989/nips.2020.01633 - 详细介绍了GPT-3的架构和训练过程,展示了LLM训练的巨大规模以及为此类模型跟踪大量超参数和配置的重要性。