Now that you have a better understanding of what data modalities are and how Multimodal AI aims to process them together, let's put that knowledge into practice. This exercise will help you become more observant of the different types of data that everyday technologies handle. Being able to identify these modalities is a fundamental first step before thinking about how an AI system might process them.
Your goal is to look at some common technologies you likely use or are familiar with. For each one, think about:
Let's work through a few examples together. Try to think them through yourself before reading our analysis.
Think about it: How do you interact with it? What does it do in response?
Our Analysis:
This is a classic example of a system that, at its core, processes audio information but often uses simple visual cues.
Think about it: What are all the ways you and others share information during a video call?
Our Analysis:
Video conferencing is inherently multimodal, combining sight, sound, and written communication.
Think about it: When you use this app, what do you upload? What do you see and interact with?
Our Analysis:
These platforms are rich in different types of media, making them prime examples of multimodal information environments.
Think about it: How do you find a restaurant? How do you place an order? What information does the app provide?
Our Analysis:
Even an app that seems straightforward like food delivery relies on multiple types of data to function effectively.
Think about two or three other pieces of technology you use regularly. It could be:
For each one, jot down:
Once you've analyzed a few technologies, consider these questions:
This exercise isn't just about listing data types. It's about starting to see the world through the lens of multimodal information. As we progress through this course, you'll learn how AI systems are designed to understand and generate these different forms of data, often in a coordinated way, similar to how humans perceive and interact with the world. Recognizing these modalities in existing technology is the first step toward appreciating the complexity and potential of Multimodal AI.
Was this section helpful?
© 2025 ApX Machine Learning