As we discussed earlier, treating each agent in a multi-agent system as a completely independent learner (like using Independent Q-Learning or IDDPG) is a straightforward approach, but it faces significant hurdles, particularly the non-stationarity problem. When many agents learn concurrently, the environment effectively changes from each agent's perspective, destabilizing the learning process. Furthermore, training separate models for potentially numerous agents can be computationally expensive and data-inefficient, especially if the agents share similarities.
Parameter sharing offers a pragmatic strategy to mitigate some of these issues, particularly in scenarios involving homogeneous agents, meaning agents that have similar goals, observation spaces, and action spaces, or are interchangeable. The core idea is simple: instead of training entirely separate models (policy networks, value networks, or both) for each agent, we train a single model whose parameters are shared across multiple agents or even all agents.
Comparison between independent learners, each with its own set of model parameters (θi), and parameter sharing, where multiple agents utilize a single set of model parameters (θ).
In practice, during training, experiences (state transitions, actions, rewards) collected from multiple agents are used to update the single shared model. For instance, if using a shared policy network πθ(ai∣oi) for agent i, the gradient update for the shared parameters θ would typically aggregate gradients calculated based on the experiences of all agents sharing those parameters.
∇θJ(θ)≈N1i=1∑N∇θJi(θ)Here, Ji(θ) is the objective function (e.g., policy gradient objective) computed using data from agent i, and N is the number of agents sharing the parameters θ. Each agent still receives its own observation oi and chooses its own action ai based on the shared policy conditioned on its specific observation. Similarly, if sharing a Q-network Qθ(oi,ai), updates would be based on transitions from all participating agents.
While attractive, parameter sharing isn't a universal solution. Its effectiveness hinges on certain assumptions:
Parameter sharing is often employed within the Centralized Training with Decentralized Execution (CTDE) framework, which we will discuss later. During centralized training, the shared parameters can be updated efficiently using aggregated data. During decentralized execution, each agent simply loads a copy of the trained shared model to make its decisions based on its local observations.
This technique represents a valuable tool in the MARL toolbox, particularly effective for cooperative tasks involving teams of similar agents, such as coordinating robot swarms or managing units in real-time strategy games. However, always consider the degree of agent homogeneity before applying it.
© 2025 ApX Machine Learning