All Courses

Advanced Adversarial Machine Learning

Chapter 1: Foundations of Adversarial ML Security

Review of Machine Learning Security Vulnerabilities

Threat Models in Machine Learning

Attack Surfaces: Training vs. Inference

Mathematical Formulation of Adversarial Examples

Taxonomy of Adversarial Attacks

Overview of Defense Strategies

Chapter 2: Advanced Evasion Attacks

Gradient-Based Attacks: FGSM, BIM, PGD Analysis

Optimization-Based Attacks: Carlini & Wagner Methods

Score-Based Attack Techniques

Decision-Based Attack Techniques

Transferability of Adversarial Examples

Attacking Ensemble Models

Implementing Evasion Attacks: Hands-on Practical

Chapter 3: Data Poisoning and Backdoor Attacks

Poisoning Attack Strategies: Availability vs Integrity

Targeted Data Poisoning Techniques

Backdoor Attack Mechanisms and Trigger Design

Clean-Label Poisoning Attacks

Analyzing Poisoning Impact on Model Training

Crafting Data Poisoning Attacks: Hands-on Practical

Chapter 4: Model Inference and Privacy Attacks

Membership Inference Attacks: Theory and Methods

Attribute Inference Techniques

Model Inversion and Reconstruction Attacks

Model Stealing: Functionality Extraction Methods

Relationship to Differential Privacy

Implementing Membership Inference: Hands-on Practical

Chapter 5: Robust Defense Mechanisms

Adversarial Training: Principles and Variations

Certified Defenses: Randomized Smoothing

Input Transformation Defenses

Gradient Masking and Obfuscation Issues

Defending Against Poisoning and Backdoors

Implementing Adversarial Training: Hands-on Practical

Chapter 6: Evaluating Model Robustness

Metrics for Adversarial Robustness

Benchmarking Tools and Frameworks

Adaptive Attacks: Evaluating Defenses Properly

Security Evaluations under Different Threat Models

Interpreting Robustness Evaluation Results

Setting up Robustness Benchmarks: Hands-on Practical

Chapter 7: Adversarial Examples in Specific Domains

Adversarial Attacks on Computer Vision Models

Generating Adversarial Text for NLP Models

Attacks on Reinforcement Learning Agents

Physical Adversarial Attacks

Domain-Specific Attack Considerations

Generating Adversarial Text: Practice

Analyzing Poisoning Impact on Model Training

Was this section helpful?

References

BadNets: Identifying Vulnerabilities in Deep Neural Networks through Backdoor Attacks, Tianyu Gu, Brendan Dolan-Gavitt, Siddharth Garg, 2017 arXiv preprint arXiv:1708.06733 DOI: 10.48550/arXiv.1708.06733 - Presents one of the earliest demonstrations of backdoor attacks against deep neural networks, detailing the setup and evaluation metrics such as attack success rate and benign accuracy.
Poisoning Attacks against Support Vector Machines, Battista Biggio, Blaine Nelson, and Pavel Laskov, 2012 Proceedings of the 29th International Conference on Machine Learning (ICML), Vol. JMLR Workshop and Conference Proceedings 2012 (Omnipress) - A seminal work on data poisoning, demonstrating how an attacker can inject malicious training samples to degrade a model's performance (availability attack).
A Survey on Data Poisoning Attacks and Defenses in Machine Learning, Niveditha Munusamy and S. Prabha, 2022 Journal of Ambient Intelligence and Humanized Computing, Vol. 13 (Springer) DOI: 10.1007/s12652-022-03913-6 - Provides a comprehensive overview of various data poisoning attacks and their evaluation metrics, along with defense strategies.
Understanding Black-box Predictions via Influence Functions, Pang Wei Koh and Percy Liang, 2017 Proceedings of the 34th International Conference on Machine Learning, Vol. 70 (PMLR) DOI: 10.5555/3305890.3305963 - Introduces influence functions as a method to understand the impact of individual training data points on model predictions, relevant for analyzing poison data effects.
Network Dissection: Quantifying Interpretability of Deep Visual Representations, David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba, 2017 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE) DOI: 10.1109/CVPR.2017.693 - Proposes a method for quantitatively evaluating the interpretability of individual neurons in deep neural networks, applicable for analyzing internal representation changes due to backdoors.

© 2025 ApX Machine LearningEngineered with