While differential privacy introduces statistical noise to protect data, and secure multi-party computation uses interactive protocols, Homomorphic Encryption (HE) offers a distinct cryptographic approach to secure aggregation in federated learning. HE allows computations to be performed directly on encrypted data without needing to decrypt it first. For the typical federated aggregation task, we are interested in schemes that are additively homomorphic.
The Magic of Additive Homomorphism
Imagine you have a special kind of encryption, let's call it Encryptpk(⋅), using a public key pk. An additive HE scheme has the remarkable property that if you combine the encrypted versions of two messages, say u1 and u2, using a specific operation ⊕ defined for the ciphertexts, the result is equivalent to the encryption of the sum of the original messages:
Encryptpk(u1)⊕Encryptpk(u2)=Encryptpk(u1+u2)
This property extends naturally to multiple messages. In the context of federated learning, each client i computes their local update ui. Instead of sending ui in the clear, they encrypt it using the public key pk to get ci=Encryptpk(ui). The clients send their ciphertexts c1,c2,…,cN to the central server.
The server, possessing only the public key, cannot decrypt any individual ci. However, leveraging the additive homomorphism, it can compute the sum of the ciphertexts:
C=c1⊕c2⊕⋯⊕cN
Due to the homomorphic property, this resulting ciphertext C is actually the encryption of the sum of all client updates:
C=Encryptpk(∑i=1Nui)
The server now has the encrypted aggregate result. To retrieve the actual sum ∑ui, the ciphertext C must be decrypted using the corresponding secret key sk. Crucially, the server typically does not possess this secret key.
HE Schemes for Aggregation
Several cryptographic schemes exhibit additive homomorphism and are potentially suitable for FL aggregation. Some well-known examples include:
- Paillier Cryptosystem: A widely studied scheme that is additively homomorphic. It's relatively efficient for addition but can have significant ciphertext expansion.
- Lattice-Based Cryptography (e.g., BGV, BFV, CKKS variants): These schemes can support both addition and multiplication (making them Fully Homomorphic, or FHE), but can be configured or used in ways that efficiently support just the additions needed for aggregation. They often involve more complex mathematical foundations but can offer performance advantages for certain operations or parameter settings.
For the purpose of simple aggregation (∑ui), we only strictly need Partially Homomorphic Encryption (PHE) that supports addition. FHE schemes, while more powerful, introduce substantial computational and complexity overhead that is often unnecessary for basic aggregation.
Workflow of HE-based Secure Aggregation
Let's outline the steps involved in a typical FL round using additive HE:
-
Setup:
- An entity (which could be a trusted third party, clients collaboratively, or sometimes the server under specific trust assumptions) generates an HE key pair (pk,sk).
- The public key pk is distributed to all participating clients.
- The secret key sk is kept confidential and held only by the designated decrypting party/parties. The security hinges on who controls sk.
-
Client Training and Encryption:
- Each client i performs local training and computes its model update ui.
- Client i encrypts its update using the public key: ci=Encryptpk(ui). This step can be computationally intensive for the client.
- Client i sends the ciphertext ci to the server. Note that ci is typically much larger than ui.
-
Server Aggregation:
- The server receives encrypted updates c1,…,cN from the participating clients.
- The server homomorphically adds the ciphertexts: C=⨁i=1Nci. This step requires the server to perform computations on ciphertexts, which can also be resource-intensive depending on the HE scheme.
-
Decryption and Model Update:
- The server sends the aggregate ciphertext C=Encryptpk(∑ui) to the entity holding the secret key sk.
- This entity decrypts C using sk to obtain the plaintext aggregate sum: S=Decryptsk(C)=∑ui.
- The decrypted sum S is then used (often by the server, after receiving it from the decryptor) to update the global model parameters.
The following diagram illustrates this flow:
Data flow in a federated learning round using additive homomorphic encryption for secure aggregation. Clients encrypt updates, the server aggregates ciphertexts, and a separate entity decrypts the sum.
Weighing the Pros and Cons
HE offers a compelling approach but comes with significant trade-offs:
Advantages:
- Strong Privacy: Provides cryptographic guarantees that the server cannot learn individual client updates, assuming the server does not possess the secret key.
- No Accuracy Loss: Unlike DP, HE does not introduce noise into the updates. The aggregated result is exact, preserving model accuracy.
- Non-Interactive Aggregation: Once keys are set up, clients simply send their encrypted updates; no complex multi-round protocols between clients are needed for aggregation itself (unlike many SMC approaches).
Disadvantages:
- Computational Overhead: HE operations (especially encryption on the client side and potentially aggregation on the server side) are computationally expensive compared to plaintext operations. This can be a major bottleneck for resource-constrained clients (e.g., mobile devices) and require powerful servers.
- Ciphertext Expansion: Encrypted data (ciphertexts) is substantially larger than the original plaintext data (updates). This significantly increases the communication cost, particularly the uplink from clients to the server.
Illustrative comparison showing the significant increase in data size when encrypting model updates using HE. The actual expansion factor depends heavily on the specific HE scheme and chosen security parameters.
- Key Management Complexity: Secure generation, distribution, and storage of the secret key sk are critical. If the server holds sk, the privacy guarantee against the server is nullified. Solutions like threshold HE (distributing sk among multiple parties) improve security but add system complexity.
- Limited Operations (PHE): Additive HE schemes generally only support addition. While sufficient for basic FedAvg-style aggregation, they don't directly support operations like element-wise multiplication or more complex functions needed for some advanced FL algorithms without resorting to FHE or combining HE with other techniques.
- Parameter Selection: Choosing the right HE scheme and its parameters (e.g., polynomial degree, coefficient modulus, security level in lattice-based schemes; key size in Paillier) is complex. These choices impact security, performance (computation and communication), and the range of values that can be encrypted.
Practical Considerations
- Threshold Homomorphic Encryption: To avoid trusting a single entity with the secret key, threshold HE schemes can be used. The secret key is split into shares distributed among several parties (e.g., a subset of clients or dedicated key servers). Decryption requires a quorum (a minimum number) of these parties to collaborate, significantly enhancing robustness against single points of failure or attack.
- Hybrid Approaches: Sometimes, HE is combined with other privacy techniques. For instance, clients might encrypt data with HE, and the server might use SMC protocols to perform operations beyond simple addition on these ciphertexts, potentially balancing performance and security trade-offs.
- Implementation Libraries: Implementing HE requires specialized cryptographic libraries such as Microsoft SEAL, PALISADE, HElib, or TFHE. Integrating these into FL frameworks requires careful engineering.
Compared to DP, HE provides cryptographic guarantees without noise but incurs higher computational and communication costs. Compared to SMC, HE often simplifies the aggregation protocol itself (less interaction) but shifts complexity to computation and key management. The choice between these techniques depends heavily on the specific system constraints, the desired level of privacy, the trust model, and the acceptable performance overhead. HE remains an active area of research, with ongoing efforts to improve efficiency and make it more practical for large-scale federated learning deployments.