Learning Transferable Visual Models From Natural Language Supervision, Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever, 2021Proceedings of the 38th International Conference on Machine Learning (ICML), Vol. 139DOI: 10.48550/arXiv.2103.00020 - Explains the CLIP model, which forms the basis for the CLIP score used in monitoring prompt adherence for generated images.