After an object detector proposes potential bounding boxes, it often generates multiple, highly overlapping boxes around the same object, each with a different confidence score. The role of Non-Maximum Suppression (NMS) is to clean up these redundant detections, leaving only the most confident, distinct boxes. While the standard greedy NMS algorithm is widely used, it can be suboptimal, especially in scenes with densely packed objects. This section examines variants of NMS designed to address these limitations.
Recall the standard greedy NMS procedure:
This approach is simple and computationally efficient. However, its main drawback lies in step 5. If a correct detection box Bj has a significant overlap with Bmax (e.g., two distinct objects very close to each other), it might be removed entirely if its IoU exceeds Nthreshold, even if it represents a different object instance. This can lead to lower recall, particularly in crowded images.
Comparison between Standard NMS and Soft-NMS when processing two overlapping boxes (A and B), where A has a higher initial score. Standard NMS removes Box B, while Soft-NMS retains Box B but reduces its confidence score.
Instead of eliminating overlapping boxes entirely, Soft-NMS proposes reducing their confidence scores based on the degree of overlap with a higher-scoring selected box. The intuition is that if a box has a high overlap with a very confident detection, it's less likely to be a correct detection itself, but it shouldn't necessarily be discarded completely, especially if it might represent a distinct, nearby object.
The score si of a box Bi is updated based on its IoU with the selected highest-scoring box Bmax:
si=si×f(IoU(Bmax,Bi))The function f(⋅) is a penalty function that decreases as the IoU increases. Two common forms are:
Here, Nthreshold is the standard NMS threshold (used to decide when to start applying the penalty in the linear case), and σ is a parameter controlling the steepness of the Gaussian decay.
By decaying scores instead of eliminating boxes, Soft-NMS often improves Average Precision (AP), especially for datasets with significant object occlusion or density. The trade-off is a slight increase in computational cost compared to standard NMS and the introduction of a new parameter (σ for the Gaussian variant) that might need tuning.
Standard NMS and Soft-NMS rely solely on the IoU metric. However, IoU doesn't consider the distance between the centers of the bounding boxes. Consider two scenarios: two boxes with high IoU because they tightly surround the same object, versus two boxes with the same high IoU but representing two distinct, adjacent objects whose boxes happen to overlap significantly. Standard NMS treats both cases identically.
DIoU-NMS incorporates the normalized distance between the central points of the two boxes into the suppression criterion. The core idea is that when suppressing a box Bi based on Bmax, not only the IoU but also the distance between their centers should be considered. If the centers are far apart, Bi is less likely to be a redundant detection of the same object as Bmax, even if their IoU is high.
The DIoU metric itself penalizes the distance between centers. In DIoU-NMS, the suppression condition is modified. Instead of just checking IoU(Bmax,Bi)>Nthreshold, DIoU-NMS might use a criterion involving the DIoU value, which is defined as:
DIoU=IoU−c2ρ2(b,bgt)where ρ(⋅) is the Euclidean distance, b and bgt are the center points of the two boxes, and c is the diagonal length of the smallest enclosing box covering both boxes. When used in NMS, the penalty term c2ρ2(b,bgt) is added to the IoU calculation during suppression. Boxes with high IoU but distant centers are penalized less and are less likely to be suppressed.
This makes DIoU-NMS particularly effective at preserving correct detections for distinct but nearby objects, leading to better performance in crowded scenes compared to standard NMS.
The choice between standard NMS and its variants depends on the specific application and dataset characteristics:
Regardless of the variant, the IoU threshold (Nthreshold) remains a critical hyperparameter. A lower threshold leads to more aggressive suppression (fewer, more distinct boxes), while a higher threshold is more permissive (potentially more overlapping boxes and false positives). Similarly, the initial confidence score threshold used to filter boxes before NMS significantly impacts the input to the NMS algorithm. Optimal values for these thresholds are typically determined empirically by evaluating performance on a validation dataset.
In summary, while greedy NMS provides a fundamental mechanism for refining object detections, variants like Soft-NMS and DIoU-NMS offer refined strategies to handle the complexities of overlapping objects, often leading to measurable improvements in detection accuracy, particularly in challenging, dense scenarios. Understanding these alternatives allows for better tailoring of the post-processing stage to the specific needs of your object detection task.
© 2025 ApX Machine Learning