While adversarial training aims to empirically improve robustness by exposing the model to attacks during training, it typically lacks formal guarantees. If you need provable assurances that your model cannot be fooled by any perturbation within a specific magnitude, you need to explore certified defenses. Randomized smoothing stands out as a practical and widely applicable technique for achieving such guarantees, particularly against $\ell_2$ -norm bounded perturbations.

The core idea is surprisingly straightforward: instead of relying on the predictions of your original, potentially complex base classifier $f(x)$ directly, you use a smoothed version, let's call it $g(x)$ . This smoothed classifier $g$ doesn't just evaluate $f$ at the input $x$ . Instead, it considers the behavior of $f$ in a neighborhood around $x$ , specifically by adding random noise (typically Gaussian) to the input many times and observing the collective outcome.

How Randomized Smoothing Works

Formally, given a base classifier $f$ (which could be any model, like a neural network), the smoothed classifier $g(x)$ is defined as the class $c$ that is most likely to be returned by $f$ when the input $x$ is perturbed by isotropic Gaussian noise $\epsilon \sim \mathcal{N}(0, \sigma^2 I)$ :

g(x) = \arg\max_c P(f(x+\epsilon) = c)

Here, $\sigma$ is a hyperparameter representing the standard deviation of the noise. It controls the degree of smoothing. A larger $\sigma$ means more smoothing, considering a wider area around the input $x$ .

In practice, we can't compute this probability exactly. Instead, we estimate it using Monte Carlo sampling:

Take the input $x$ .
Generate $n$ samples of noise $\epsilon_1, \dots, \epsilon_n$ from $\mathcal{N}(0, \sigma^2 I)$ .
Query the base classifier $f$ for each noisy input: $f(x+\epsilon_1), \dots, f(x+\epsilon_n)$ .
Count the votes for each class.
The prediction $g(x)$ is the class with the most votes. If there's a tie, we might abstain (return $\perp$ ).

The prediction process for a smoothed classifier $g(x)$ . The input $x$ is perturbed multiple times with Gaussian noise, passed through the base classifier $f$ , and the final prediction is determined by a majority vote over the results.

The Certification: Provable Robustness

The beauty of randomized smoothing lies in its ability to provide a certificate. If the smoothed classifier $g(x)$ predicts a class $c_A$ , we can calculate a radius $R$ such that for any perturbation $\delta$ with $||\delta||_2 \le R$ , the smoothed classifier $g(x+\delta)$ will still predict class $c_A$ . This provides a guarantee against any $\ell_2$ attack within that radius.

The theorem underpinning this (from Cohen, Rosenfeld, and Kolter, 2019) connects the certified radius $R$ directly to the noise level $\sigma$ and the probability with which the base classifier $f$ predicts the majority class $c_A$ under noise. Let $p_A$ be a lower bound (obtained with high confidence, typically using methods like the Clopper-Pearson interval from the Monte Carlo counts) on the true probability $P(f(x+\epsilon) = c_A)$ . The certified radius $R$ is then given by:

R = \sigma \Phi^{-1}(p_A)

Here, $\Phi^{-1}$ is the inverse cumulative distribution function (CDF) of the standard Gaussian distribution $\mathcal{N}(0, 1)$ . Intuitively, if the base classifier $f$ is highly consistent in predicting class $c_A$ even when significant noise ( $\sigma$ ) is added (meaning $p_A$ is close to 1), then the smoothed classifier $g$ is strong within a larger radius $R$ . If the predictions are less consistent ( $p_A$ is closer to 0.5), the certified radius shrinks. Note that if $p_A \le 0.5$ , the radius is zero or undefined, meaning no guarantee can be provided.

Advantages and Disadvantages

Randomized smoothing offers several appealing properties:

Provable Guarantees: It provides mathematically sound robustness certificates against $\ell_2$ perturbations.
Classifier Agnostic: It can be applied on top of any base classifier. You don't need special network architectures. You can even take a pre-trained model and wrap it with smoothing.
Scalability: Training the base classifier is standard. The certification step involves multiple forward passes, which is computationally manageable compared to some other certified defense methods, especially for large networks.

However, it also comes with trade-offs:

$\ell_2$ Norm Specificity: The standard method provides guarantees primarily for the $\ell_2$ norm. While extensions exist for other norms (like $\ell_1$ using Laplacian noise), they might be less practical.
Accuracy vs. Robustness: Increasing the noise $\sigma$ generally increases the certified radius $R$ but often decreases the accuracy on clean, unperturbed data. Finding the right balance is important.
Inference Cost: Making a prediction with $g(x)$ requires multiple forward passes through $f$ . Certification requires even more samples to get tight probability bounds, increasing inference time significantly.
Limited Radius: The achievable certified radius $R$ might be smaller than the perturbation magnitudes used in some empirical attacks, especially if high clean accuracy needs to be maintained.

Practical Notes: Tuning $\sigma$ and Sampling

The noise level $\sigma$ is the most significant hyperparameter.

A small $\sigma$ leads to high clean accuracy but small certified radii (or none at all).
A large $\sigma$ can yield larger certified radii but may degrade clean accuracy substantially, as the smoothing might blur important features.

The number of Monte Carlo samples $n$ affects both prediction and certification:

For prediction, a moderate number (e.g., $n=100$ ) might suffice.
For certification, a much larger number (e.g., $n=100,000$ ) is often needed to obtain a tight lower bound $p_A$ with high confidence (e.g., $\alpha=0.001$ ).

Typical trade-off between clean accuracy and certified accuracy (at a fixed radius $R$ ) as the noise level $\sigma$ increases. Higher noise often boosts certified accuracy up to a point, but consistently decreases clean accuracy.

Randomized smoothing represents a significant step towards building ML systems with verifiable security properties. While not a silver bullet, it provides a practical framework for obtaining provable robustness guarantees, complementing empirical defenses like adversarial training by offering a different kind of assurance grounded in statistical certification. Understanding its mechanism, benefits, and limitations is essential for deploying robust machine learning models in security-sensitive applications.

How Randomized Smoothing Works

The Certification: Provable Robustness

Advantages and Disadvantages

Practical Notes: Tuning σ\sigmaσ and Sampling

Practical Notes: Tuning $\sigma$ and Sampling

Certified Defenses: Randomized Smoothing