Moving beyond classifying entire images or drawing bounding boxes around objects, image segmentation aims for a pixel-level understanding of scene content. The goal is to assign a category label to every pixel in an image, providing a detailed mask that outlines the exact shape of objects.
In this chapter, you will learn the fundamental concepts and techniques for image segmentation using deep learning. We'll start by distinguishing between semantic segmentation (labeling pixels by category, like 'road' or 'person') and instance segmentation (labeling distinct object instances, like 'person 1' and 'person 2').
You will study key architectures and methods developed specifically for this dense prediction task, including:
We will also discuss common evaluation metrics used for segmentation tasks, such as Intersection over Union (IoU), often calculated as J(A,B)=∣A∪B∣∣A∩B∣, and conclude with practical implementation exercises.
4.1 Semantic Segmentation vs. Instance Segmentation
4.2 Fully Convolutional Networks for Segmentation
4.3 Encoder-Decoder Architectures: U-Net and SegNet
4.4 Dilated (Atrous) Convolutions for Segmentation
4.5 DeepLab Family: Atrous Spatial Pyramid Pooling
4.6 Instance Segmentation Approaches (Mask R-CNN)
4.7 Evaluation Metrics for Segmentation
4.8 Hands-on Practical: Building a Semantic Segmentation Model
© 2025 ApX Machine Learning