Intermediate Reinforcement Learning Techniques
Chapter 1: Revisiting Reinforcement Learning Fundamentals
The Reinforcement Learning Problem Setup
Markov Decision Processes (MDPs) Recap
Value Functions and Bellman Equations
Tabular Solution Methods: Q-Learning and SARSA
Limitations of Tabular Methods
Chapter 2: Deep Q-Networks (DQN)
Introduction to Function Approximation
Using Neural Networks for Q-Value Approximation
The DQN Algorithm Architecture
Experience Replay Mechanism
Fixed Q-Targets (Target Networks)
Loss Function for DQN Training
Hands-on Practical: Implementing DQN for CartPole
Chapter 3: Improvements and Variants of DQN
The Overestimation Problem in Q-Learning
Dueling Network Architectures
Combining DQN Improvements
Prioritized Experience Replay (Brief Overview)
Practice: Implementing Double DQN
Chapter 4: Policy Gradient Methods
Limitations of Value-Based Methods
Direct Policy Parameterization
The Policy Gradient Theorem
Understanding Variance in Policy Gradients
Baselines for Variance Reduction
Hands-on Practical: Implementing REINFORCE
Chapter 5: Actor-Critic Methods
Combining Policy and Value Estimation
Actor-Critic Architecture Overview
Advantage Actor-Critic (A2C)
Asynchronous Advantage Actor-Critic (A3C)
Implementation Considerations for Actor-Critic
Comparison: REINFORCE vs A2C/A3C
Practice: Developing an A2C Implementation