Now that we understand what computer vision aims to achieve, let's look at some specific problems or 'tasks' that computer vision systems are designed to solve. Think of these as common questions we ask a computer about an image or video. For this introductory course, we'll just define a few fundamental ones. Later chapters will touch upon the techniques used to tackle some of these.
Imagine you have a picture, and you want the computer to tell you what the single most prominent subject in that picture is. Is it a cat? A dog? A car? A house? This task is called Image Classification.
The goal is straightforward: assign one label (a category name) to an entire image. The system looks at the whole image and decides which predefined category it belongs to.
For example, given an input image, a classification system might output the label "cat". It doesn't tell you where the cat is in the image, just that the image contains a cat. This is often one of the first tasks people learn in computer vision because it establishes a foundation for understanding how machines can interpret visual content.
An image classification system takes an image and outputs a single category label for the entire image.
What if you not only want to know that there's a car in the image, but also where it is? And what if there are multiple objects you care about? This is where Object Detection comes in.
Object detection is a step beyond classification. Its goal is to find instances of objects from certain categories within an image and determine their location. Typically, the location is indicated by drawing a rectangular bounding box around each detected object.
So, for an image of a busy street, an object detection system might identify multiple cars, pedestrians, and traffic lights, drawing a box around each one and labeling it accordingly (e.g., "car", "person", "traffic light"). This provides more detailed information about the image content compared to classification.
An object detection system identifies multiple objects in an image and indicates their positions, usually with bounding boxes and labels.
Sometimes, a bounding box isn't precise enough. What if you need to know the exact shape of an object, down to the individual pixel level? For instance, in medical imaging, you might want to outline the precise boundary of a tumor. This task is called Image Segmentation.
Image segmentation involves partitioning an image into multiple segments or regions. The goal is typically to assign a category label to every single pixel in the image. Pixels belonging to the same object category get the same label.
Imagine coloring a photograph: all pixels that are part of a car might be colored red, all pixels that are part of the road blue, and all pixels belonging to trees green. This results in a detailed map of the image where each pixel's category is known. It provides a much finer-grained understanding of the image content than classification or object detection.
Image segmentation assigns a category label to every pixel in the image, effectively outlining the shapes of different objects or regions.
Classification, detection, and segmentation are fundamental tasks, but computer vision encompasses much more. Other common tasks include:
These are just brief definitions to give you a sense of the common goals in computer vision. As you progress, you'll learn more about how these tasks are approached and the techniques involved. For now, the key takeaway is that computer vision aims to extract specific types of information from visual data, and these tasks represent different levels of detail and types of understanding we might seek. In the next sections, we'll start setting up the tools needed to begin exploring these ideas practically.
© 2025 ApX Machine Learning