While previous discussions centered on evasion attacks, which manipulate inputs after a model is trained, this chapter shifts focus to attacks occurring during the training phase itself. We examine how attackers can corrupt the learning process by injecting malicious data or embedding hidden functionalities.
Specifically, we will study data poisoning, where the integrity or availability of the model is compromised through tainted training samples. You will learn about different strategies, including attacks targeting overall performance (availability) versus those causing specific misclassifications (integrity). Techniques for crafting targeted poisoning data will be explored.
We will also cover backdoor attacks. These attacks implant hidden triggers into the model during training. The model functions normally on most inputs but misbehaves predictably when the attacker-defined trigger is present. We will look into trigger design and the mechanisms used to insert these backdoors. A significant part covers clean-label attacks, a subtle form of poisoning where manipulated data still appears correctly labeled to a human inspector.
Methods for analyzing the impact of these training-time attacks on the final model will be presented. Practical sections will provide hands-on experience in implementing basic poisoning and backdoor scenarios against machine learning models.
3.1 Poisoning Attack Strategies: Availability vs Integrity
3.2 Targeted Data Poisoning Techniques
3.3 Backdoor Attack Mechanisms and Trigger Design
3.4 Clean-Label Poisoning Attacks
3.5 Analyzing Poisoning Impact on Model Training
3.6 Crafting Data Poisoning Attacks: Hands-on Practical
© 2025 ApX Machine Learning