All Courses

Poisoning Attack Strategies: Availability vs Integrity

As introduced, data poisoning attacks subvert the machine learning model during its training phase by manipulating the training dataset. However, not all poisoning attacks share the same objective. The specific goal of the attacker dictates the strategy employed, primarily falling into two categories: availability attacks and integrity attacks. Understanding this distinction is fundamental to recognizing and defending against these threats.

Availability Attacks: Disrupting Overall Performance

Availability attacks, sometimes called indiscriminate attacks, aim to degrade the overall performance of the trained model. The attacker's goal isn't necessarily to cause specific misclassifications but rather to reduce the model's general utility, often measured by a drop in accuracy across the board or for multiple classes. Think of it as maximizing the model's error rate on clean, unseen data after it has been trained on the poisoned dataset.

How is this achieved? The attacker typically injects noisy or contradictory data points into the training set. These poisoned samples might:

Increase Feature Overlap: Introduce samples with features characteristic of one class but assign them the label of another, blurring the distinctions the model tries to learn.
Inject Noise: Add data points far from the typical distribution of any class, forcing the model to adjust its decision boundaries in suboptimal ways to accommodate these outliers.
Contradict Learned Patterns: Insert samples designed to conflict with the patterns the model is successfully learning from the clean data.

Consider a loss function $L(\theta, D)$ where $\theta$ represents the model parameters and $D = D_{clean} \cup D_{poison}$ is the training dataset. An availability attack aims to craft $D_{poison}$ such that the parameters $\theta^*$ minimizing the loss on $D$ , $\theta^* = \arg \min_{\theta} L(\theta, D)$ , yield poor performance (high loss or error) when evaluated on a clean test set $D_{test}$ . The injected data effectively hinders the model's ability to generalize from the training data.

An attacker might choose an availability strategy simply to sabotage a competitor's model or disrupt a service relying on machine learning, without needing fine-grained control over how the model fails. These attacks can sometimes be easier to mount than integrity attacks, as generating broadly disruptive noise might require less precise manipulation. However, significant drops in overall performance might also be more readily detected through standard validation procedures.

Integrity Attacks: Causing Specific Failures

Integrity attacks, or targeted attacks, are more surgical. The attacker's objective is to cause the model to misbehave in very specific, predetermined ways, while ideally maintaining normal performance on most other inputs. This makes the attack stealthier and potentially more damaging if the targeted failure has significant consequences.

Common goals for integrity attacks include:

Targeted Misclassification: Forcing the model to misclassify a specific input sample or a small, defined group of inputs after training. For example, causing a specific user's face to be misidentified.
Backdoor Creation: Embedding a hidden trigger (covered in more detail later in this chapter). The model behaves normally until an input containing the trigger is presented, at which point it produces an attacker-chosen output.

Crafting integrity attacks requires more sophistication. The attacker must design poison samples $D_{poison}$ that subtly shift the model's decision boundary in a precise location or create specific internal representations sensitive to a trigger. These poison points are often carefully optimized to be close to the target instance(s) in the feature space or to mimic legitimate data points associated with the wrong class (as seen in clean-label attacks).

The optimization objective is different here. The attacker wants to find $D_{poison}$ such that the resulting model $\theta^* = \arg \min_{\theta} L(\theta, D_{clean} \cup D_{poison})$ satisfies the attacker's specific goal (e.g., $f_{\theta^*}(x_{target}) = y_{attacker}$ ) while keeping the performance on the general clean test set $D_{test}$ largely unchanged, $L(\theta^*, D_{test}) \approx L(\theta_{clean}^* , D_{test})$ , where $\theta_{clean}^*$ is the model trained only on clean data.

Integrity attacks are often preferred when the attacker has a specific outcome in mind, such as bypassing an authentication system for a particular user or causing a competitor's product to fail on a specific benchmark input. Their stealthiness makes them harder to detect using simple accuracy monitoring.

Comparing Availability and Integrity Poisoning

Feature	Availability Attack	Integrity Attack
Goal	Degrade overall model performance	Cause specific, targeted misbehavior
Scope	Indiscriminate, affects many inputs	Targeted, affects specific inputs/triggers
Impact	Reduced general accuracy/utility	Specific failure modes, potential stealth
Mechanism	Inject noise, increase feature overlap	Precise boundary manipulation, triggers
Detectability	Potentially easier (performance drop)	Potentially harder (stealthy)
Example	Sabotaging a spam filter's overall F1	Making a spam filter always allow emails from [email protected]

The following visualization illustrates the difference. Availability attacks might add noise that broadly confuses the boundary, while integrity attacks carefully place points to shift the boundary locally, causing a specific target (star) to be misclassified.

2d feature space showing clean data for two classes (A and B), a potential original decision boundary, an integrity attack target point (star), integrity poison points designed to misclassify the target, and availability poison points adding general noise near the boundary. Lines illustrate how the decision boundary might shift due to each type of poisoning.

Choosing between availability and integrity strategies depends heavily on the attacker's resources, knowledge of the target system, and ultimate objective. As we proceed, we will examine specific techniques for implementing both types of attacks, starting with more detailed methods for targeted data poisoning and then moving to the related concept of backdoor attacks.

Was this section helpful?