All Courses

Intermediate Reinforcement Learning Techniques

Chapter 1: Revisiting Reinforcement Learning Fundamentals

The Reinforcement Learning Problem Setup

Markov Decision Processes (MDPs) Recap

Value Functions and Bellman Equations

Tabular Solution Methods: Q-Learning and SARSA

Limitations of Tabular Methods

Quiz for Chapter 1

Chapter 2: Deep Q-Networks (DQN)

Introduction to Function Approximation

Using Neural Networks for Q-Value Approximation

The DQN Algorithm Architecture

Experience Replay Mechanism

Fixed Q-Targets (Target Networks)

Loss Function for DQN Training

Hands-on Practical: Implementing DQN for CartPole

Quiz for Chapter 2

Chapter 3: Improvements and Variants of DQN

The Overestimation Problem in Q-Learning

Double DQN (DDQN)

Dueling Network Architectures

Combining DQN Improvements

Prioritized Experience Replay (Brief Overview)

Practice: Implementing Double DQN

Quiz for Chapter 3

Chapter 4: Policy Gradient Methods

Limitations of Value-Based Methods

Direct Policy Parameterization

The Policy Gradient Theorem

REINFORCE Algorithm

Understanding Variance in Policy Gradients

Baselines for Variance Reduction

Hands-on Practical: Implementing REINFORCE

Quiz for Chapter 4

Chapter 5: Actor-Critic Methods

Combining Policy and Value Estimation

Actor-Critic Architecture Overview

Advantage Actor-Critic (A2C)

Asynchronous Advantage Actor-Critic (A3C)

Implementation Considerations for Actor-Critic

Comparison: REINFORCE vs A2C/A3C

Practice: Developing an A2C Implementation

Quiz for Chapter 5

Understanding Variance in Policy Gradients

Was this section helpful?

References

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018 (MIT Press) - A classic and comprehensive textbook on reinforcement learning, providing detailed explanation of policy gradient methods, including REINFORCE, its unbiasedness and high variance, and methods for variance reduction using baselines. Second edition.
Policy gradient methods for reinforcement learning with function approximation, Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour, 1999 Advances in Neural Information Processing Systems 12, Vol. 12 (MIT Press) - This influential paper formalizes policy gradient theorems and discusses high variance, demonstrating how subtracting a baseline can reduce variance without introducing bias. It provides theoretical background for the variance reduction techniques mentioned.

© 2025 ApX Machine LearningEngineered with