Having detailed how attackers exploit vulnerabilities in machine learning systems through methods like evasion, poisoning, and inference attacks, we now shift focus to constructing models capable of withstanding such manipulations. This chapter introduces practical techniques for enhancing model security.
You will study adversarial training, a prominent method that incorporates adversarial examples, such as those generated via Projected Gradient Descent (PGD), directly into the training process. This is often formulated as a minimax optimization problem like minθE(x,y)∼D[maxδ∈SL(θ,x+δ,y)]. We will examine certified defenses, such as randomized smoothing, which provide mathematical guarantees on model resilience within a defined perturbation radius r. Additionally, you'll analyze input transformation techniques, understand the issue of obfuscated gradients which can give a false sense of security, and explore strategies specifically designed to counter data poisoning and backdoor threats. Hands-on sections will guide you through implementing key defense mechanisms like adversarial training.
5.1 Adversarial Training: Principles and Variations
5.2 Certified Defenses: Randomized Smoothing
5.3 Input Transformation Defenses
5.4 Gradient Masking and Obfuscation Issues
5.5 Defending Against Poisoning and Backdoors
5.6 Implementing Adversarial Training: Hands-on Practical
© 2025 ApX Machine Learning