ApX logoApX logo
The Role of the KL Divergence Penalty in PPO