Learning Transferable Visual Models From Natural Language Supervision, Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever, 2021Proceedings of the 38th International Conference on Machine Learning, Vol. 139 (PMLR)DOI: 10.5555/3540306.3540445 - 这篇论文介绍了CLIP,一个通过对比学习为图像和文本学习到稳健共享表示的模型,它支持直接的跨模态比较,并展示了共享嵌入空间的实用性。