Scaling Laws for Neural Language Models, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei, 2020arXiv preprint arXiv:2001.08361DOI: 10.48550/arXiv.2001.08361 - This paper empirically establishes the relationship between model performance, model size, dataset size, and computational resources. It clarifies how training data quantity affects model abilities.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu, 2020Journal of Machine Learning Research, Vol. 21 - This paper introduces the T5 model and the C4 dataset, a widely used public dataset sourced from Common Crawl. It offers an example of the dataset size and curation involved in preparing training data for LLMs.
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜, Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell, 2021Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery)DOI: 10.1145/3442188.3445922 - This work critically discusses the ethical and societal risks associated with large language models, especially focusing on concerns stemming from biases and limitations within their extensive training datasets.