As established, evasion attacks aim to fool a model during inference by adding carefully constructed noise to the input. Gradient-based attacks are a fundamental and widely studied category of such attacks. They operate on a simple yet effective principle: modify the input in the direction that maximally increases the model's loss for the correct label, thereby pushing the model towards a misclassification. This requires access to the model's gradients with respect to the input, making these primarily white-box attacks. We'll now examine three foundational gradient-based methods: FGSM, BIM, and PGD.
Fast Gradient Sign Method (FGSM)
The Fast Gradient Sign Method is one of the earliest and simplest methods for generating adversarial examples. Proposed by Goodfellow et al. in 2014, FGSM performs a single step of gradient ascent on the loss function with respect to the input features. The core idea is that moving the input slightly in the direction indicated by the sign of the gradient is an efficient way to increase the loss and potentially cause misclassification.
Let J(θ,x,y) be the loss function used to train the model with parameters θ, input x, and true label y. The gradient ∇xJ(θ,x,y) indicates the direction in the input space where the loss increases fastest. FGSM crafts the adversarial example xadv as follows:
xadv=x+ϵ⋅sign(∇xJ(θ,x,y))
Here:
- ϵ is a small scalar value controlling the magnitude of the perturbation (L∞ norm). It determines the maximum change allowed for any single feature (e.g., pixel value).
- sign(⋅) is the sign function, which returns +1 if the input is positive, -1 if negative, and often 0 if the input is zero. Applying the sign function ensures that the perturbation has a fixed magnitude ϵ in each dimension where the gradient is non-zero, effectively maximizing the change under the L∞ constraint.
Properties of FGSM:
- Speed: It requires only one gradient calculation, making it very fast.
- Simplicity: The concept and implementation are straightforward.
- Effectiveness: While simple, it can be surprisingly effective against undefended models.
- Suboptimality: The single large step might not be optimal. It can overshoot the decision boundary or land in regions where the model is still correct. The perturbation might also be unnecessarily large visually.
FGSM serves as an important baseline attack and is also a component in some defense strategies like basic adversarial training.
# Python snippet for FGSM (using a framework like PyTorch)
import torch
def fgsm_attack(model, loss_fn, image, label, epsilon):
image.requires_grad = True # Track gradients w.r.t. image
output = model(image)
loss = loss_fn(output, label)
model.zero_grad() # Clear previous gradients
loss.backward() # Calculate gradients of loss w.r.t. inputs
# Get the gradient
gradient = image.grad.data
# Get the sign of the gradient
sign_gradient = gradient.sign()
# Create the perturbed image
perturbed_image = image + epsilon * sign_gradient
# Optional: Clamp image values to valid range (e.g., [0, 1])
perturbed_image = torch.clamp(perturbed_image, 0, 1)
# Detach the perturbed image from the computation graph
return perturbed_image.detach()
Basic Iterative Method (BIM) / Iterative FGSM (I-FGSM)
Recognizing that FGSM's single large step might be suboptimal, Kurakin et al. (2016) proposed an iterative version, often called the Basic Iterative Method (BIM) or Iterative FGSM (I-FGSM). Instead of one large step of size ϵ, BIM applies the FGSM step multiple times with a smaller step size α, clipping the result after each step to ensure the total perturbation remains within the ϵ-ball around the original input x.
The process for N iterations is:
- Initialize: xadv(0)=x
- Iterate i from 0 to N−1:
xadv(i+1)=Clipx,ϵ(xadv(i)+α⋅sign(∇xJ(θ,xadv(i),y)))
Here:
- α is the step size for each iteration, typically chosen as ϵ/N or a small fixed value.
- Clipx,ϵ(⋅) is an element-wise clipping function that ensures the resulting xadv(i+1) remains within the L∞ neighborhood of the original input x. Specifically, for each element j:
(xadv(i+1))j=max(min((xadv(i+1))j,xj+ϵ),xj−ϵ)
Often, an additional clipping step is needed to ensure the pixel values stay within the valid range (e.g., [0, 1] or [0, 255]).
Properties of BIM:
- Effectiveness: Generally finds more effective adversarial examples than FGSM for the same ϵ, often requiring smaller perturbations to achieve misclassification.
- Computational Cost: Requires multiple gradient calculations (one per iteration), making it slower than FGSM.
- Step Size Tuning: The choice of α and N can influence the attack's success and speed.
BIM demonstrates that iterative refinement significantly improves the potency of gradient-based attacks.
Projected Gradient Descent (PGD)
Projected Gradient Descent, introduced by Madry et al. (2017), is arguably the most widely used and powerful first-order iterative attack. It builds upon BIM but incorporates a few refinements that make it a stronger adversary, particularly against defenses. PGD is considered a standard benchmark for evaluating model robustness.
The key steps are:
- Random Start: Initialize the adversarial example xadv(0) by adding a small random perturbation to the original input x, typically drawn uniformly from [−ϵ,ϵ] and then projected onto the ϵ-ball around x.
xadv(0)=ΠB(x,ϵ)(x+δrandom)
where δrandom is the random noise and ΠB(x,ϵ) denotes projection onto the Lp-ball centered at x with radius ϵ.
- Iterative Gradient Ascent: For i from 0 to N−1:
a. Calculate the gradient: g=∇xJ(θ,xadv(i),y)
b. Update the example using a signed gradient step:
x′=xadv(i)+α⋅sign(g)(for L∞ norm)
(For L2 norm, the update is often x′=xadv(i)+α⋅g/∣∣g∣∣2)
c. Project back onto the ϵ-ball:
xadv(i+1)=ΠB(x,ϵ)(x′)
d. (Optional but common) Clip values to the valid input range (e.g., [0, 1]).
Key Differences from BIM:
- Random Start: This is a significant difference. Starting from a random point within the ϵ-ball helps PGD avoid getting stuck in poor local optima near the original input's decision boundary. It makes the attack more robust against defenses that might try to smooth the loss landscape locally.
- Projection: The projection step ΠB(x,ϵ) is explicitly formulated, ensuring the iterate always stays within the allowed perturbation bounds defined by the chosen Lp norm. For L∞, this projection is equivalent to the element-wise clipping used in BIM. For L2, it involves rescaling the perturbation vector if its norm exceeds ϵ.
Properties of PGD:
- Strength: Considered one of the strongest first-order attacks. Models robust to PGD are often robust to many other gradient-based attacks.
- Benchmark Standard: Widely used for evaluating adversarial defenses and as the attack method within PGD Adversarial Training (PGD-AT).
- Flexibility: Can be adapted for different Lp norms (L∞, L2) by changing the update step and projection method.
- Computational Cost: Similar to BIM, it requires multiple iterations and gradient computations.
The random start and iterative projection make PGD a formidable baseline for assessing how well a model can withstand gradient-based evasion attempts.
Comparison of FGSM's single step versus the iterative update and projection/clipping loop in BIM and PGD. PGD typically adds a random initialization before the loop.
Analysis and Comparison
Feature |
FGSM |
BIM / I-FGSM |
PGD |
Iterations |
1 |
Multiple (N) |
Multiple (N) |
Step Size |
Large (ϵ) |
Small (α<ϵ) |
Small (α<ϵ) |
Constraint |
Implicit via sign & ϵ |
Explicit Clipping per step |
Explicit Projection per step |
Start Point |
Original Input x |
Original Input x |
Random Perturbation around x |
Speed |
Very Fast |
Moderate |
Moderate |
Strength |
Baseline |
Stronger than FGSM |
Very Strong / Benchmark Standard |
Complexity |
Low |
Moderate |
Moderate |
While FGSM provides a quick assessment, BIM improves effectiveness through iteration. PGD further enhances this by adding a random start and formal projection, making it a more reliable method for finding adversarial examples and evaluating defenses. The choice between L∞ and L2 norms also impacts the attack: L∞ PGD tends to create small changes across many pixels, while L2 PGD might create more localized but potentially larger changes, resulting in visually different perturbations. Understanding these gradient-based methods is foundational before examining more sophisticated optimization-based or query-based attacks.