Computer vision models, while powerful, operate on high-dimensional inputs (pixels) making them particularly sensitive to adversarial perturbations. Even small, carefully crafted changes to an image, often imperceptible to humans, can cause sophisticated models like Convolutional Neural Networks (CNNs) to misclassify objects, fail to detect them, or even assign incorrect labels to image segments. This section examines how adversarial attacks manifest and are adapted for core computer vision tasks.
Building on the foundational attack methodologies discussed previously (Chapter 2), such as Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) attacks, we now focus on their application and refinement within the vision domain.
A central theme in attacking vision models is maintaining perceptual similarity. An attack is generally considered more effective or dangerous if the resulting adversarial image looks identical or very similar to the original image to a human observer. Common mathematical constraints used to achieve this involve bounding the perturbation using Lp norms:
While Lp norms are computationally convenient, they don't always perfectly align with human perception. Research continues into developing attacks based on more perceptually aligned metrics like Structural Similarity Index Measure (SSIM) or techniques that manipulate images in frequency domains.
Adversarial attacks extend beyond simple image classification. They pose significant threats to more complex vision tasks:
Object detectors identify multiple objects within an image and draw bounding boxes around them. Attacks aim to disrupt this process in several ways:
Creating effective attacks often requires modifying loss functions used during attack optimization. Instead of just maximizing the classification loss for a single output, attackers might target losses associated with object presence scores, class probabilities within detected boxes, or bounding box regression outputs.
Flow illustrating how an adversarial perturbation can modify the output of an object detector, causing missed detections or misclassifications.
Semantic segmentation models assign a class label to every pixel in an image. Attacks against these models aim to cause misclassification at the pixel level. This can manifest as:
Attack generation often involves maximizing a pixel-wise classification loss (e.g., cross-entropy averaged over all pixels) subject to perceptual constraints (Lp norms). Targeted attacks might try to change all pixels belonging to one class (e.g., 'road') into another specific class (e.g., 'water').
Techniques like transfer attacks, where perturbations crafted for one model successfully fool another, are highly relevant in computer vision due to the prevalence of standard architectures (like ResNet, VGG) and pre-trained models. An attacker might generate perturbations against a locally available surrogate model and then use those perturbations against a target black-box vision API.
Similarly, score-based and decision-based attacks, which rely only on model outputs (confidence scores or final labels), can be applied. However, the high dimensionality of image inputs often makes these attacks computationally expensive, requiring a large number of queries to the target model. Efficient query strategies are an active area of research for attacking vision systems in black-box settings.
Understanding these CV-specific attack vectors is essential for developing robust models and defenses tailored to the unique challenges posed by image data. The high-dimensional nature of images and the intricacies of human perception make computer vision a fertile ground for adversarial manipulation.
© 2025 ApX Machine Learning