Show and Tell: A Neural Image Caption Generator, Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, 2015Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE)DOI: 10.1109/CVPR.2015.7298918 - A foundational paper introducing one of the first successful end-to-end neural models for generating natural language captions from images, exemplifying vision-language multimodal AI.