While adversarial training aims to empirically improve robustness by exposing the model to attacks during training, it typically lacks formal guarantees. If you need provable assurances that your model cannot be fooled by any perturbation within a specific magnitude, you need to explore certified defenses. Randomized smoothing stands out as a practical and widely applicable technique for achieving such guarantees, particularly against -norm bounded perturbations.
The core idea is surprisingly straightforward: instead of relying on the predictions of your original, potentially complex base classifier directly, you use a smoothed version, let's call it . This smoothed classifier doesn't just evaluate at the input . Instead, it considers the behavior of in a neighborhood around , specifically by adding random noise (typically Gaussian) to the input many times and observing the collective outcome.
Formally, given a base classifier (which could be any model, like a neural network), the smoothed classifier is defined as the class that is most likely to be returned by when the input is perturbed by isotropic Gaussian noise :
Here, is a hyperparameter representing the standard deviation of the noise. It controls the degree of smoothing. A larger means more smoothing, considering a wider area around the input .
In practice, we can't compute this probability exactly. Instead, we estimate it using Monte Carlo sampling:
The prediction process for a smoothed classifier . The input is perturbed multiple times with Gaussian noise, passed through the base classifier , and the final prediction is determined by a majority vote over the results.
The beauty of randomized smoothing lies in its ability to provide a certificate. If the smoothed classifier predicts a class , we can calculate a radius such that for any perturbation with , the smoothed classifier will still predict class . This provides a guarantee against any attack within that radius.
The theorem underpinning this (from Cohen, Rosenfeld, and Kolter, 2019) connects the certified radius directly to the noise level and the probability with which the base classifier predicts the majority class under noise. Let be a lower bound (obtained with high confidence, typically using methods like the Clopper-Pearson interval from the Monte Carlo counts) on the true probability . The certified radius is then given by:
Here, is the inverse cumulative distribution function (CDF) of the standard Gaussian distribution . Intuitively, if the base classifier is highly consistent in predicting class even when significant noise () is added (meaning is close to 1), then the smoothed classifier is strong within a larger radius . If the predictions are less consistent ( is closer to 0.5), the certified radius shrinks. Note that if , the radius is zero or undefined, meaning no guarantee can be provided.
Randomized smoothing offers several appealing properties:
However, it also comes with trade-offs:
The noise level is the most significant hyperparameter.
The number of Monte Carlo samples affects both prediction and certification:
Typical trade-off between clean accuracy and certified accuracy (at a fixed radius ) as the noise level increases. Higher noise often boosts certified accuracy up to a point, but consistently decreases clean accuracy.
Randomized smoothing represents a significant step towards building ML systems with verifiable security properties. While not a silver bullet, it provides a practical framework for obtaining provable robustness guarantees, complementing empirical defenses like adversarial training by offering a different kind of assurance grounded in statistical certification. Understanding its mechanism, benefits, and limitations is essential for deploying robust machine learning models in security-sensitive applications.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with