All Courses

Adversarial Attacks on Computer Vision Models

Computer vision models, while powerful, operate on high-dimensional inputs (pixels) making them particularly sensitive to adversarial perturbations. Even small, carefully crafted changes to an image, often imperceptible to humans, can cause sophisticated models like Convolutional Neural Networks (CNNs) to misclassify objects, fail to detect them, or even assign incorrect labels to image segments. This section examines how adversarial attacks manifest and are adapted for core computer vision tasks.

Building on the foundational attack methodologies discussed previously (Chapter 2), such as Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) attacks, we now focus on their application and refinement within the vision domain.

Perturbation Constraints in Images

A central theme in attacking vision models is maintaining perceptual similarity. An attack is generally considered more effective or dangerous if the resulting adversarial image looks identical or very similar to the original image to a human observer. Common mathematical constraints used to achieve this involve bounding the perturbation using $L_p$ norms:

$L_{\infty}$ norm: This norm measures the maximum change applied to any single pixel. Bounding the $L_{\infty}$ norm ensures that no individual pixel deviates too drastically from its original value, often resulting in subtle, widespread changes. Mathematically, for an original image $x$ and adversarial image $x'$ , the perturbation $\delta = x' - x$ satisfies $||\delta||_{\infty} = \max_i |\delta_i| \le \epsilon$ , where $\delta_i$ is the change to the $i$ -th pixel and $\epsilon$ is a small budget.
$L_2$ norm: This norm measures the overall Euclidean magnitude of the perturbation across all pixels. Attacks constrained by $L_2$ often result in smoother, less noticeable noise patterns compared to $L_{\infty}$ , as they limit the total energy of the perturbation. The constraint is $||\delta||_2 = \sqrt{\sum_i \delta_i^2} \le \epsilon$ .
$L_0$ norm: This norm counts the number of pixels that have been altered. $L_0$ attacks modify only a small, sparse subset of pixels, sometimes significantly, which can be effective but potentially more noticeable if the changed pixels are clustered. The constraint is $||\delta||_0 = \sum_i \mathbb{I}(\delta_i \neq 0) \le k$ , where $k$ is the maximum number of pixels to modify.

While $L_p$ norms are computationally convenient, they don't always perfectly align with human perception. Research continues into developing attacks based on more perceptually aligned metrics like Structural Similarity Index Measure (SSIM) or techniques that manipulate images in frequency domains.

Attacks on Object Detection and Segmentation

Adversarial attacks extend past simple image classification. They pose significant threats to more complex vision tasks:

Object Detection Attacks

Object detectors identify multiple objects within an image and draw bounding boxes around them. Attacks aim to disrupt this process in several ways:

Object Vanishing: Perturbing the image so that the detector fails to identify one or more objects that are clearly present.
Misclassification: Causing the detector to assign an incorrect label to a correctly localized object (e.g., detecting a 'dog' but labeling it as a 'cat').
Object Generation/Hallucination: Introducing perturbations that make the detector identify and bound objects that do not exist in the scene.
Bounding Box Manipulation: Altering the size or position of the predicted bounding boxes, even if the object is detected and correctly classified.

Creating effective attacks often requires modifying loss functions used during attack optimization. Instead of just maximizing the classification loss for a single output, attackers might target losses associated with object presence scores, class probabilities within detected boxes, or bounding box regression outputs.

Flow illustrating how an adversarial perturbation can modify the output of an object detector, causing missed detections or misclassifications.

Semantic Segmentation Attacks

Semantic segmentation models assign a class label to every pixel in an image. Attacks against these models aim to cause misclassification at the pixel level. This can manifest as:

Subtle changes along object boundaries.
Large contiguous regions being assigned the wrong class label.
Introduction of nonsensical segmented areas.

Attack generation often involves maximizing a pixel-wise classification loss (e.g., cross-entropy averaged over all pixels) subject to perceptual constraints ( $L_p$ norms). Targeted attacks might try to change all pixels belonging to one class (e.g., 'road') into another specific class (e.g., 'water').

Applying Transfer and Query-Based Attacks

Techniques like transfer attacks, where perturbations crafted for one model successfully fool another, are highly relevant in computer vision due to the prevalence of standard architectures (like ResNet, VGG) and pre-trained models. An attacker might generate perturbations against a locally available surrogate model and then use those perturbations against a target black-box vision API.

Similarly, score-based and decision-based attacks, which rely only on model outputs (confidence scores or final labels), can be applied. However, the high dimensionality of image inputs often makes these attacks computationally expensive, requiring a large number of queries to the target model. Efficient query strategies are an active area of research for attacking vision systems in black-box settings.

Understanding these CV-specific attack vectors is essential for developing models and defenses tailored to the unique challenges posed by image data. The high-dimensional nature of images and details of human perception make computer vision a fertile ground for adversarial manipulation.

Was this section helpful?