Scaling Laws for Neural Language Models, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei, 2020arXiv preprint arXiv:2001.08361DOI: 10.48550/arXiv.2001.08361 - 研究论文分析大型语言模型训练损失与模型及数据集规模的行为关系,为性能设定预期。