Home
Blog
Courses
LLMs
EN
All Courses
Introduction to Reinforcement Learning
Chapter 1: Foundations of Reinforcement Learning
What is Reinforcement Learning?
Agents and Environments
States, Actions, and Rewards
Policies: Mapping States to Actions
The RL Workflow: Interaction Loops
Types of RL Tasks: Episodic vs Continuing
Comparing RL with Other Learning Types
Setting up Your Python Environment for RL
Quiz for Chapter 1
Chapter 2: Markov Decision Processes (MDPs)
Modeling Sequential Decision Making
Formal Definition of an MDP
State Transition Probabilities
Reward Functions
Return: Cumulative Future Rewards
Discounting Future Rewards
Policies and Value Functions (Vπ, Qπ)
Finding Optimal Policies
Quiz for Chapter 2
Chapter 3: Estimating Value Functions
The Bellman Expectation Equation
The Bellman Optimality Equation
Solving Bellman Equations (Overview)
Dynamic Programming: Policy Iteration
Dynamic Programming: Value Iteration
Limitations of Dynamic Programming
Quiz for Chapter 3
Chapter 4: Monte Carlo Methods
Learning from Complete Episodes
Monte Carlo Prediction: Estimating Vπ
Monte Carlo Control: Estimating Qπ
On-Policy vs Off-Policy Learning
MC Control without Exploring Starts
On-Policy First-Visit MC Control Implementation
Off-Policy MC Prediction and Control Intro
Practice: Implementing MC Prediction
Quiz for Chapter 4
Chapter 5: Temporal-Difference Learning
Learning from Incomplete Episodes
TD(0) Prediction: Estimating Vπ
Advantages of TD Learning over MC
SARSA: On-Policy TD Control
Q-Learning: Off-Policy TD Control
Comparing SARSA and Q-Learning
Expected SARSA
Hands-on Practical: Implementing Q-Learning
Quiz for Chapter 5
Chapter 6: Function Approximation in RL
Handling Large State Spaces
Value Function Approximation (VFA)
Feature Vectors for State Representation
Linear Methods for VFA
Gradient Descent for Parameter Learning
Semi-gradient TD Methods
Using Neural Networks for VFA
Practice: Applying Linear VFA
Quiz for Chapter 6
Chapter 7: Introduction to Deep Q-Networks (DQN)
Combining Q-Learning with Deep Learning
Challenges with Neural Networks in RL
Experience Replay Mechanism
Fixed Q-Targets (Target Networks)
The DQN Algorithm Structure
Architectural Considerations for DQNs
Hands-on Practical: Building a Basic DQN
Quiz for Chapter 7
Chapter 8: Introduction to Policy Gradient Methods
Learning Policies Directly
Policy Gradient Theorem (Concept)
REINFORCE Algorithm
Baselines for Variance Reduction
Actor-Critic Methods Overview
Comparing Value-Based and Policy-Based Methods
Practice: Implementing REINFORCE
Quiz for Chapter 8
States, Actions, and Rewards
Was this section helpful?
Helpful
Report Issue
Mark as Complete
© 2025 ApX Machine Learning