Learning Transferable Visual Models From Natural Language Supervision, Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever, 2021Proceedings of the 38th International Conference on Machine Learning, Vol. 139 (PMLR)DOI: 10.48550/arXiv.2103.00020 - 介绍对比语言-图像预训练(CLIP),该模型学习文本和图像的共享嵌入空间,为文本到图像合成提供强大的文本编码器和引导机制。