Ensemble methods, which combine predictions from multiple individual models, are frequently employed not just to enhance predictive accuracy but also as a heuristic defense against adversarial attacks. The intuition is that an adversarial example crafted to fool one specific model might not be effective against others, especially if the ensemble members are diverse (e.g., different architectures, trained on different data subsets). However, ensembles are not impervious to evasion attacks. Attacking them requires specific strategies that account for the aggregated decision-making process.
Let's consider an ensemble F composed of M individual models f1,f2,…,fM. The final prediction F(x) is typically derived by combining the individual predictions fi(x), often through mechanisms like majority voting (for classification) or averaging probabilities.
F(x)=aggregate(f1(x),f2(x),…,fM(x))An attacker's goal is to find a perturbation δ such that F(x+δ)=F(x), while ∣∣δ∣∣p≤ϵ.
Several approaches can be employed to generate adversarial examples against ensemble models:
Leveraging Transferability: As discussed in the section on transferability, adversarial examples crafted for one model often have some effectiveness against other models, particularly those trained on similar data or having similar architectures. An attacker can generate an adversarial example targeting a single, potentially weaker, member fi of the ensemble and hope it transfers sufficiently to fool the aggregate decision F. Alternatively, an attacker might train their own surrogate model that mimics the ensemble's behavior (if the ensemble is black-box) or use a known, accessible model from the ensemble as a proxy. The attack is then performed on this surrogate/proxy model. While simpler, this approach might be less effective than attacks directly targeting the ensemble.
Optimization Against Aggregate Output: A more direct approach involves formulating the attack as an optimization problem that targets the ensemble's combined output.
Attacking All Members Simultaneously: An attacker can try to find a single perturbation δ that is effective against all or a majority of the individual models fi. This often involves modifying the objective function in optimization-based attacks. For example, instead of minimizing the loss for one model, the objective could be to minimize the sum or maximum of the losses across all ensemble members:
maximize i=1∑ML(fi(x+δ),ytarget)ormaximize imaxL(fi(x+δ),ytarget)subject to ∣∣δ∣∣p≤ϵ. This generally increases the computational cost of the attack significantly, as gradients need to be computed for all models in each step.
Process of attacking an ensemble model. The attacker searches for a perturbation δ to create xadv, which is fed into multiple models (f1,f2,f3). The individual outputs are combined by an aggregation mechanism, and the goal is to make the final ensemble prediction F(xadv) incorrect.
Attacking ensembles is generally more computationally expensive than attacking single models, especially when optimizing against the aggregate output or targeting all members simultaneously. The effectiveness of the attack often depends on the diversity of the ensemble members. Ensembles composed of highly similar models might not offer much additional resilience compared to a single model. Conversely, highly diverse ensembles can be significantly more challenging to attack successfully using a single small perturbation.
When evaluating the robustness of an ensemble defense, it's important to use attacks specifically designed for ensembles, rather than relying solely on the transferability of attacks generated against individual members. As we will see in Chapter 6, evaluating defenses requires adaptive attacks tailored to the specific defense mechanism being tested.
The next section provides a hands-on practical session where you'll implement some of the evasion attacks discussed in this chapter, potentially including basic ensemble attack concepts.
© 2025 ApX Machine Learning