Having examined methods for attacking machine learning models and strategies to defend against them, the next step is to determine how well these defenses actually perform. Implementing a defense mechanism is not enough; we need dependable ways to measure a model's security against potential threats. This chapter focuses on the methods and practices for rigorously assessing model security.
You will learn about standard metrics used to quantify security, such as accuracy under attack or the minimum perturbation magnitude, often represented using Lp norms like L0, L2, or L∞, needed to cause misclassification. We will examine common benchmarking tools and frameworks like ART and CleverHans that aid these evaluations. A significant part of evaluation involves designing strong, adaptive attacks specifically tailored to test the limits of a defense, avoiding the false sense of security provided by weak assessments. We will also discuss how to conduct evaluations considering different attacker assumptions (threat models) and how to interpret the resulting security assessments effectively. By the end of this chapter, you will be equipped to set up and run systematic security evaluations for machine learning models.
6.1 Metrics for Adversarial Robustness
6.2 Benchmarking Tools and Frameworks
6.3 Adaptive Attacks: Evaluating Defenses Properly
6.4 Security Evaluations under Different Threat Models
6.5 Interpreting Robustness Evaluation Results
6.6 Setting up Robustness Benchmarks: Hands-on Practical
© 2025 ApX Machine Learning