While adversarial training aims to empirically improve robustness by exposing the model to attacks during training, it typically lacks formal guarantees. If you need provable assurances that your model cannot be fooled by any perturbation within a specific magnitude, you need to explore certified defenses. Randomized smoothing stands out as a practical and widely applicable technique for achieving such guarantees, particularly against ℓ2-norm bounded perturbations.
The core idea is surprisingly straightforward: instead of relying on the predictions of your original, potentially complex base classifier f(x) directly, you use a smoothed version, let's call it g(x). This smoothed classifier g doesn't just evaluate f at the input x. Instead, it considers the behavior of f in a neighborhood around x, specifically by adding random noise (typically Gaussian) to the input many times and observing the collective outcome.
Formally, given a base classifier f (which could be any model, like a neural network), the smoothed classifier g(x) is defined as the class c that is most likely to be returned by f when the input x is perturbed by isotropic Gaussian noise ϵ∼N(0,σ2I):
g(x)=argcmaxP(f(x+ϵ)=c)Here, σ is a hyperparameter representing the standard deviation of the noise. It controls the degree of smoothing. A larger σ means more smoothing, considering a wider area around the input x.
In practice, we can't compute this probability exactly. Instead, we estimate it using Monte Carlo sampling:
The prediction process for a smoothed classifier g(x). The input x is perturbed multiple times with Gaussian noise, passed through the base classifier f, and the final prediction is determined by a majority vote over the results.
The beauty of randomized smoothing lies in its ability to provide a certificate. If the smoothed classifier g(x) predicts a class cA, we can calculate a radius R such that for any perturbation δ with ∣∣δ∣∣2≤R, the smoothed classifier g(x+δ) will still predict class cA. This provides a guarantee against any ℓ2 attack within that radius.
The theorem underpinning this (from Cohen, Rosenfeld, and Kolter, 2019) connects the certified radius R directly to the noise level σ and the probability with which the base classifier f predicts the majority class cA under noise. Let pA be a lower bound (obtained with high confidence, typically using methods like the Clopper-Pearson interval from the Monte Carlo counts) on the true probability P(f(x+ϵ)=cA). The certified radius R is then given by:
R=σΦ−1(pA)Here, Φ−1 is the inverse cumulative distribution function (CDF) of the standard Gaussian distribution N(0,1). Intuitively, if the base classifier f is highly consistent in predicting class cA even when significant noise (σ) is added (meaning pA is close to 1), then the smoothed classifier g is strong within a larger radius R. If the predictions are less consistent (pA is closer to 0.5), the certified radius shrinks. Note that if pA≤0.5, the radius is zero or undefined, meaning no guarantee can be provided.
Randomized smoothing offers several appealing properties:
However, it also comes with trade-offs:
The noise level σ is the most significant hyperparameter.
The number of Monte Carlo samples n affects both prediction and certification:
Typical trade-off between clean accuracy and certified accuracy (at a fixed radius R) as the noise level σ increases. Higher noise often boosts certified accuracy up to a point, but consistently decreases clean accuracy.
Randomized smoothing represents a significant step towards building ML systems with verifiable security properties. While not a silver bullet, it provides a practical framework for obtaining provable robustness guarantees, complementing empirical defenses like adversarial training by offering a different kind of assurance grounded in statistical certification. Understanding its mechanism, benefits, and limitations is essential for deploying robust machine learning models in security-sensitive applications.
Was this section helpful?
© 2025 ApX Machine Learning