While Federated Averaging (FedAvg) provides a simple baseline for aggregating client updates, its performance can degrade significantly in real-world federated networks characterized by systems heterogeneity. Clients often possess varying computational capabilities (CPU, memory), network bandwidths, and power constraints. This heterogeneity leads to variations in the amount of local computation each client can perform within a given communication round. Some clients might complete many local training epochs (E), while others only manage a few.
Standard FedAvg aggregates client models (or updates) typically weighted by the number of data points (nk). However, it doesn't explicitly account for the differing amounts of local work (τk, the number of local gradient steps or updates performed by client k) done to arrive at those updates. When τk varies greatly across clients, FedAvg can suffer from:
Federated Normalized Averaging (FedNova) tackles systems heterogeneity directly by normalizing client updates before aggregation. The central idea is to adjust each client's contribution to counteract the effect of varying local computations, aiming for an update that better reflects the average gradient direction across clients, irrespective of how many steps each took locally.
Instead of averaging the final local models or the raw updates Δkt, FedNova averages the updates normalized by the amount of local work performed. Let τk be the number of local steps (e.g., SGD updates) performed by client k in round t. The local update submitted by client k is Δkt=wkt+1−wt.
FedNova computes the global model update as follows:
wt+1=wt+k∈St∑pk(τkΔkt)Here, St is the set of clients participating in round t, and pk is the aggregation weight for client k (commonly pk=nk/∑j∈Stnj, but other weightings are possible).
Compare this to the implicit update direction in FedAvg when viewed through local updates:
wFedAvgt+1=wt+k∈St∑pkΔktEffectively, FedNova aims to average the average local step update (Δkt/τk), weighted by pk. This prevents clients who performed many local steps (τk≫τj) from disproportionately influencing the magnitude and direction of the aggregated update simply because they computed more. It assumes that Δkt/τk provides a better estimate of the local gradient direction relevant to the global objective than the raw update Δkt when τk varies.
Consider two clients with the same amount of data (p1=p2=0.5) but different computational capabilities. Client 1 performs τ1=2 local steps, while Client 2 performs τ2=10 local steps. Their raw updates are Δ1t and Δ2t.
FedAvg directly averages potentially disparate updates Δ1t and Δ2t. If Client 2's update Δ2t is much larger due to τ2=10, it might dominate. FedNova first normalizes by the number of steps (τk) before averaging, potentially leading to a different aggregate direction that better reflects the average single-step progress.
The primary advantage of FedNova is its robustness to systems heterogeneity. By normalizing updates based on local work:
Implementing FedNova requires a minor modification to the standard FL protocol:
It's important to note that FedNova primarily addresses systems heterogeneity. It does not directly solve the challenges of statistical heterogeneity (Non-IID data), although it can potentially be combined with algorithms like FedProx or SCAFFOLD designed for that purpose. The effectiveness of FedNova relies on the assumption that the normalized update Δkt/τk is a meaningful quantity, representing an average step direction. This generally holds well for common local solvers like SGD.
In summary, FedNova provides a theoretically grounded and practically effective mechanism for mitigating the adverse effects of varying computational resources and local steps across clients in a federated network. By normalizing updates before aggregation, it promotes fairer contributions and can lead to faster, more stable convergence compared to FedAvg in heterogeneous system environments. It's a valuable tool for building more performant FL systems when client capabilities vary significantly.
© 2025 ApX Machine Learning