Learning Transferable Visual Models From Natural Language Supervision, Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever, 2021Proceedings of the 38th International Conference on Machine Learning (ICML), Vol. 139DOI: 10.48550/arXiv.2103.00020 - 解释了CLIP模型,这是监测生成图像提示一致性所用的CLIP分数的基础。