Learning Transferable Visual Models From Natural Language Supervision, Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever, 2021Proceedings of the 38th International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.2103.00020 - 该论文介绍了CLIP模型,该模型通过在大型数据集上进行对比预训练,学习图像和文本的共享表示。它提供了共享表示的学习和应用方式的一个示例,尤其是在跨模态检索方面。