As introduced earlier, applying a differential privacy mechanism like noise addition in a single round of federated learning provides an (ϵ,δ) guarantee for that specific interaction. However, federated learning typically involves numerous communication rounds, sometimes hundreds or thousands. Each round potentially leaks a small amount of information. A critical question arises: how does the overall privacy loss accumulate across these multiple rounds? Simply stating the per-round guarantee is insufficient; we need to understand the total privacy cost over the entire training process. This is where composition theorems become essential.
Imagine repeatedly querying a database (or in our case, repeatedly receiving noisy aggregated updates derived from client data) using DP mechanisms. Each query introduces some privacy loss. Composition theorems provide bounds on the total privacy loss after a sequence of such queries or mechanisms.
The simplest way to analyze the combined effect is through basic sequential composition. It states that if you apply k mechanisms, where the i-th mechanism is (ϵi,δi)-differentially private, the sequence of all k mechanisms is (∑i=1kϵi,∑i=1kδi)-differentially private.
If all mechanisms are identical, providing (ϵ,δ)-DP each time, then after k rounds, the total privacy loss is bounded by (kϵ,kδ).
ϵtotal≤kϵround δtotal≤kδroundWhile simple and intuitive, this linear accumulation often results in a loose upper bound. In a typical FL scenario with many rounds (k large), kϵ can quickly become very large, suggesting a weak overall privacy guarantee unless the per-round ϵround is made extremely small. Making ϵround too small, however, often requires adding excessive noise, significantly degrading model utility. For practical FL systems needing hundreds of rounds, basic composition is often too pessimistic.
Fortunately, tighter bounds exist under the framework of advanced composition. These theorems leverage the fact that the privacy losses don't simply add up linearly in the worst way possible, especially concerning the ϵ parameter.
A standard result in advanced composition (derived from analyzing the moments accountant or using techniques like Rényi Differential Privacy) shows that for a sequence of k mechanisms, each satisfying (ϵ,δ)-DP, the combined mechanism satisfies (ϵ′,δ′)-DP for any δ′′>0, where:
ϵ′≈2kln(1/δ′′)ϵ+kϵ(eϵ−1) δ′=kδ+δ′′Let's break down the expression for ϵ′:
The implication is significant: advanced composition allows for a much more favorable accumulation of privacy loss, especially for the ϵ parameter. We can afford a larger per-round ϵround for the same total budget ϵtotal compared to basic composition, leading to potentially better model utility.
Comparison of total privacy loss (ϵ) accumulation over k rounds using basic (linear) and advanced (sublinear) composition, assuming ϵround=0.1, δround=10−5, and δ′′=10−6 for the advanced calculation. Advanced composition yields significantly tighter bounds for larger k.
Modern privacy accounting methods, often based on Rényi Differential Privacy (RDP), provide frameworks for precisely tracking privacy loss under composition. RDP uses a different function (the RDP curve) to measure privacy loss, which composes very naturally. Tools like Google's dp-accounting
library implement these techniques, allowing developers to accurately compute the (ϵ,δ)-DP guarantee for complex algorithms like DP-FedAvg run over multiple rounds.
The total acceptable privacy loss for the entire federated learning process is known as the privacy budget, typically denoted as (ϵtotal,δtotal). This budget represents the maximum cumulative privacy leakage we are willing to tolerate. Choosing appropriate values for ϵtotal and δtotal depends heavily on the sensitivity of the data, the application context, and any regulatory requirements (like GDPR or HIPAA). Common practice often aims for ϵtotal in the single digits (e.g., 1 to 10) and δtotal to be very small (e.g., less than 1/Nclients, where Nclients is the number of clients).
Once a total budget (ϵtotal,δtotal) is set for, say, T communication rounds, the task becomes budget allocation: deciding how much privacy loss (ϵround,δround) is permissible in each round such that the total loss remains within the budget according to composition theorems.
Several strategies exist:
Understanding and correctly applying composition theorems is fundamental for building federated learning systems with meaningful, quantifiable privacy guarantees over the entire training duration. Advanced composition and specialized accounting tools are indispensable for achieving practical privacy-utility trade-offs in multi-round federated processes.
© 2025 ApX Machine Learning