VQA: Visual Question Answering, Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, 2015Proceedings of the IEEE International Conference on Computer Vision (ICCV) (IEEE)DOI: 10.1109/ICCV.2015.337 - This paper introduces the Visual Question Answering (VQA) task, a prominent example of multimodal AI that requires combining visual and textual information to understand and answer questions, illustrating the development of new applications.