Benefits of Combining Multiple Modalities

Was this section helpful?

References

VQA: Visual Question Answering, Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, 2015 Proceedings of the IEEE International Conference on Computer Vision (ICCV) (IEEE) DOI: 10.1109/ICCV.2015.337 - This paper introduces the Visual Question Answering (VQA) task, a prominent example of multimodal AI that requires combining visual and textual information to understand and answer questions, illustrating the development of new applications.