Now that we have examined the theoretical underpinnings of several advanced evasion attacks, it's time to put theory into practice. This hands-on section will guide you through implementing two significant evasion attacks discussed earlier: Projected Gradient Descent (PGD) and Carlini & Wagner (C&W). We will use the Adversarial Robustness Toolbox (ART) library, a popular Python framework designed for evaluating the security of machine learning models. ART provides convenient abstractions for both attacks and defenses, integrating well with common deep learning frameworks like PyTorch and TensorFlow.Working through these examples will solidify your understanding of how these attacks generate adversarial examples and how their parameters influence the outcome. We assume you have a Python environment set up with PyTorch (or TensorFlow) and ART installed.Environment SetupFirst, ensure you have the necessary libraries installed. We'll use ART with PyTorch in this example. You can install it using pip:pip install adversarial-robustness-toolbox[pytorch] torch torchvisionWe also need a trained model and some data. For simplicity, let's use a pre-trained simple convolutional neural network (CNN) on the MNIST dataset. You would typically load your own trained model, but ART provides utilities for common datasets and basic models which are helpful for experimentation.import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import numpy as np from torchvision import datasets, transforms from torch.utils.data import DataLoader # Define a simple CNN model (example) class SimpleMNISTCNN(nn.Module): def __init__(self): super(SimpleMNISTCNN, self).__init__() self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1) self.fc1 = nn.Linear(7*7*64, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = F.relu(F.max_pool2d(self.conv1(x), 2)) x = F.relu(F.max_pool2d(self.conv2(x), 2)) x = x.view(-1, 7*7*64) x = F.relu(self.fc1(x)) x = self.fc2(x) # ART classifiers expect logit outputs return x # Load MNIST data transform = transforms.Compose([transforms.ToTensor()]) test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform) test_loader = DataLoader(test_dataset, batch_size=100, shuffle=False) # --- IMPORTANT --- # For this practical, assume 'model' is a pre-trained instance of SimpleMNISTCNN # and is set to evaluation mode: model.eval() # For example: # model = SimpleMNISTCNN() # model.load_state_dict(torch.load('path/to/your/trained_mnist_cnn.pth')) # model.eval() # device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # model.to(device) # # You need to replace this with loading your actual trained model. # For demonstration purposes, we'll proceed assuming 'model' exists. # Let's create a placeholder model (untrained) just for code structure. # !!! Replace this with your actual trained model loading !!! model = SimpleMNISTCNN() model.eval() device = torch.device("cpu") # Use CPU for this example structure model.to(device) # !!! End of Placeholder !!! # Define loss function and optimizer (needed for ART classifier wrapper) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) # Get a batch of test data data_iter = iter(test_loader) x_test_batch, y_test_batch = next(data_iter) x_test_batch, y_test_batch = x_test_batch.to(device), y_test_batch.to(device) # Convert to numpy for ART (some ART functions prefer numpy) x_test_np = x_test_batch.cpu().numpy() y_test_np = y_test_batch.cpu().numpy() # Keep as integer labels Wrapping the Model with ARTART requires wrapping your native model (PyTorch, TensorFlow, etc.) in an ART classifier object. This wrapper provides a standardized API for attacks and defenses.from art.estimators.classification import PyTorchClassifier # Wrap the PyTorch model with ART's PyTorchClassifier classifier = PyTorchClassifier( model=model, loss=criterion, optimizer=optimizer, # Optimizer might not be strictly needed for inference attacks but good practice input_shape=(1, 28, 28), # MNIST image shape (channels, height, width) nb_classes=10, # Number of classes in MNIST clip_values=(0.0, 1.0) # Input data range (MNIST tensors are typically [0, 1]) )Now classifier is ready to be used with ART's attack implementations.Implementing Projected Gradient Descent (PGD)PGD is an iterative extension of FGSM. It takes multiple small steps in the gradient direction, projecting the result back onto the allowed perturbation space ($L_p$ ball) after each step. This often finds more effective adversarial examples than single-step methods.Important parameters for PGD:norm: The $L_p$ norm to constrain the perturbation (e.g., np.inf for $L_\infty$, 2 for $L_2$). $L_\infty$ is common for images, limiting the maximum change per pixel.eps: Maximum perturbation magnitude $\epsilon$. Controls the "strength" of the attack.eps_step: Step size for each iteration. Should be smaller than eps.max_iter: Number of iterations. More iterations can find better examples but take longer.targeted: If True, try to make the model predict a specific target class. If False (default), try to cause any misclassification.Let's implement the $L_\infty$ PGD attack:from art.attacks.evasion import ProjectedGradientDescent # Configure the PGD attack pgd_attack = ProjectedGradientDescent( estimator=classifier, norm=np.inf, # Use the L-infinity norm eps=0.1, # Maximum perturbation (epsilon) - adjust based on model/data eps_step=0.01, # Step size per iteration max_iter=40, # Number of iterations targeted=False, # Untargeted attack num_random_init=1, # Use random initialization for robustness batch_size=100 ) # Generate adversarial examples for the test batch print("Generating PGD adversarial examples...") x_test_adv_pgd = pgd_attack.generate(x=x_test_np) print("PGD generation complete.") # Evaluate the model on original and adversarial examples predictions_clean = classifier.predict(x_test_np) accuracy_clean = np.sum(np.argmax(predictions_clean, axis=1) == y_test_np) / len(y_test_np) print(f"Accuracy on clean examples: {accuracy_clean * 100:.2f}%") predictions_pgd = classifier.predict(x_test_adv_pgd) accuracy_pgd = np.sum(np.argmax(predictions_pgd, axis=1) == y_test_np) / len(y_test_np) print(f"Accuracy on PGD adversarial examples (eps={pgd_attack.eps:.2f}): {accuracy_pgd * 100:.2f}%") # Calculate average L-infinity distortion avg_linf_distortion_pgd = np.mean(np.max(np.abs(x_test_adv_pgd - x_test_np), axis=(1, 2, 3))) print(f"Average L-infinity distortion (PGD): {avg_linf_distortion_pgd:.4f}")You should observe a significant drop in accuracy on the adversarial examples compared to the clean ones, assuming your model wasn't specifically trained to be robust against PGD. The average $L_\infty$ distortion should be close to the specified eps value.Implementing Carlini & Wagner (C&W) L2 AttackThe C&W attacks are optimization-based, framing the search for an adversarial example as a constrained optimization problem. The $L_2$ version is particularly effective at finding perturbations with low $L_2$ distance, meaning the overall magnitude of the change is minimized, often resulting in less visually perceptible perturbations compared to $L_\infty$ attacks at similar success rates.Parameters for C&W $L_2$:confidence: Controls the desired gap between the logit of the incorrect class and the maximum logit of other classes. Higher values make the attack stronger but potentially increase distortion.learning_rate: Learning rate for the optimization process.binary_search_steps: Number of steps to find the optimal trade-off constant between distortion and classification loss.max_iter: Maximum iterations for the optimization within each binary search step.batch_size: Process examples in batches.Let's implement the C&W $L_2$ attack:from art.attacks.evasion import CarliniL2Method # Configure the C&W L2 attack # Note: C&W can be computationally expensive, especially with many iterations/steps. # Reduce max_iter or binary_search_steps for faster execution if needed. cw_attack = CarliniL2Method( classifier=classifier, confidence=0.0, # Minimum confidence gap learning_rate=0.01, # Optimizer learning rate binary_search_steps=5, # Number of binary search steps max_iter=10, # Max iterations per binary search step batch_size=100, targeted=False # Untargeted attack ) # Generate adversarial examples print("Generating C&W L2 adversarial examples...") # Warning: This can be slow! Consider running on a smaller subset of x_test_np for exploration. # x_test_adv_cw = cw_attack.generate(x=x_test_np[:10]) # Example: Use first 10 samples x_test_adv_cw = cw_attack.generate(x=x_test_np) print("C&W L2 generation complete.") # Evaluate the model on C&W examples # If you used a subset above, evaluate only on that subset and corresponding y_test_np # y_test_subset = y_test_np[:10] # accuracy_cw = np.sum(np.argmax(classifier.predict(x_test_adv_cw), axis=1) == y_test_subset) / len(y_test_subset) predictions_cw = classifier.predict(x_test_adv_cw) accuracy_cw = np.sum(np.argmax(predictions_cw, axis=1) == y_test_np) / len(y_test_np) print(f"Accuracy on C&W L2 adversarial examples: {accuracy_cw * 100:.2f}%") # Calculate average L2 distortion avg_l2_distortion_cw = np.mean(np.linalg.norm((x_test_adv_cw - x_test_np).reshape(len(x_test_np), -1), axis=1)) print(f"Average L2 distortion (C&W): {avg_l2_distortion_cw:.4f}")Typically, C&W $L_2$ achieves a high attack success rate (low accuracy) while often maintaining a lower average $L_2$ distortion compared to what PGD might achieve for a similar success rate, although PGD constrained by $L_2$ norm also exists. The trade-off is computational cost; C&W is significantly slower than PGD.Visualizing the PerturbationsIt's insightful to visualize the original image, the adversarial version, and the perturbation itself.import matplotlib.pyplot as plt # Select an example index (e.g., the first one) idx = 0 # Ensure data is in the right format for matplotlib (H, W) or (H, W, C) # Squeeze the channel dimension for MNIST original_image = x_test_np[idx].squeeze() adv_image_pgd = x_test_adv_pgd[idx].squeeze() adv_image_cw = x_test_adv_cw[idx].squeeze() perturbation_pgd = adv_image_pgd - original_image perturbation_cw = adv_image_cw - original_image # Get model predictions for this specific example pred_orig_probs = F.softmax(torch.tensor(predictions_clean[idx:idx+1]), dim=1).detach().numpy().flatten() pred_pgd_probs = F.softmax(torch.tensor(predictions_pgd[idx:idx+1]), dim=1).detach().numpy().flatten() pred_cw_probs = F.softmax(torch.tensor(predictions_cw[idx:idx+1]), dim=1).detach().numpy().flatten() pred_orig_label = np.argmax(pred_orig_probs) pred_pgd_label = np.argmax(pred_pgd_probs) pred_cw_label = np.argmax(pred_cw_probs) true_label = y_test_np[idx] # Plotting fig, axes = plt.subplots(2, 3, figsize=(12, 8)) # Row 1: PGD axes[0, 0].imshow(original_image, cmap='gray') axes[0, 0].set_title(f"Original\nTrue: {true_label}, Pred: {pred_orig_label}\nConf: {pred_orig_probs[pred_orig_label]:.2f}") axes[0, 0].axis('off') # Enhance perturbation visibility for plotting # Center the colormap around zero and use a diverging map pert_vis_pgd = axes[0, 1].imshow(perturbation_pgd, cmap='coolwarm', vmin=-pgd_attack.eps, vmax=pgd_attack.eps) axes[0, 1].set_title(f"PGD Perturbation ($L_\infty={np.max(np.abs(perturbation_pgd)):.3f}$)\n(Scaled Visually)") axes[0, 1].axis('off') fig.colorbar(pert_vis_pgd, ax=axes[0, 1], shrink=0.7) axes[0, 2].imshow(adv_image_pgd, cmap='gray') axes[0, 2].set_title(f"PGD Adversarial\nPred: {pred_pgd_label}\nConf: {pred_pgd_probs[pred_pgd_label]:.2f}") axes[0, 2].axis('off') # Row 2: C&W L2 axes[1, 0].imshow(original_image, cmap='gray') axes[1, 0].set_title(f"Original\nTrue: {true_label}, Pred: {pred_orig_label}\nConf: {pred_orig_probs[pred_orig_label]:.2f}") axes[1, 0].axis('off') # Calculate L2 norm for the specific C&W perturbation l2_pert_cw = np.linalg.norm(perturbation_cw.flatten()) pert_vis_cw = axes[1, 1].imshow(perturbation_cw, cmap='coolwarm', vmin=-np.abs(perturbation_cw).max(), vmax=np.abs(perturbation_cw).max()) axes[1, 1].set_title(f"C&W Perturbation ($L_2={l2_pert_cw:.3f}$)\n(Scaled Visually)") axes[1, 1].axis('off') fig.colorbar(pert_vis_cw, ax=axes[1, 1], shrink=0.7) axes[1, 2].imshow(adv_image_cw, cmap='gray') axes[1, 2].set_title(f"C&W L2 Adversarial\nPred: {pred_cw_label}\nConf: {pred_cw_probs[pred_cw_label]:.2f}") axes[1, 2].axis('off') plt.tight_layout() plt.show()Observe how the adversarial examples look very similar to the original to the human eye, yet the model's prediction changes significantly. Notice the structure and magnitude differences between the PGD ($L_\infty$) and C&W ($L_2$) perturbations. PGD often utilizes the full $\epsilon$ budget on many pixels, while C&W $L_2$ might make smoother, more distributed changes.Exploration and Next StepsThis practical provides a starting point for implementing evasion attacks. Consider these next steps:Vary Parameters: Experiment with different eps, eps_step, max_iter for PGD, and confidence, learning_rate, max_iter, binary_search_steps for C&W. Observe how these affect the attack success rate and the distortion ($L_\infty$, $L_2$).Different Norms: Implement PGD with norm=2 and compare its results (accuracy, L2 distortion) to PGD $L_\infty$ and C&W $L_2$.Targeted Attacks: Modify the attacks to be targeted=True. You will need to provide target labels (e.g., y_target = (y_test_np + 1) % 10). Analyze if targeted attacks are harder or easier to generate.Other Attacks: Explore other evasion attacks available in ART, such as FGSM (FastGradientMethod), Basic Iterative Method (BIM - essentially PGD with num_random_init=0), DeepFool (DeepFool), or decision-based attacks like the Boundary Attack (BoundaryAttack).Different Models/Datasets: Apply these attacks to more complex models (e.g., ResNets) and datasets (e.g., CIFAR-10). Note that parameters like eps might need significant adjustment based on the dataset and input normalization.By implementing and experimenting with these foundational attacks, you gain practical insight into the mechanics of crafting adversarial examples, which is essential for understanding both offensive capabilities and the challenges in building defenses.