Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei, 2020Advances in Neural Information Processing Systems, Vol. 33 (Curran Associates, Inc.)DOI: 10.48550/arXiv.2005.14165 - Introduces GPT-3, highlighting the immense scale of parameters that laid the groundwork for many of the operational challenges discussed.
TensorRT-LLM Documentation, NVIDIA Corporation, 2025 (NVIDIA) - Official documentation for NVIDIA's high-performance inference library for large language models, providing insights into practical deployment and optimization.
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜, Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, 2021Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery)DOI: 10.1145/3442188.3445922 - A critical paper discussing the ethical and societal risks of large language models, particularly concerning data quality, bias, and the potential for harmful outputs.