In the previous chapter, we established the goal of training: minimizing a loss function L by adjusting network weights using gradient descent. However, efficiently calculating the gradient ∇L with respect to all weights in a multi-layer network requires a specific technique.
This chapter introduces the backpropagation algorithm, the standard method for computing these gradients. We will examine its foundation in the calculus chain rule and visualize the process using computational graphs. Subsequently, we will move beyond basic gradient descent and study more sophisticated optimization algorithms. These include Momentum, RMSprop, and Adam, which help accelerate convergence and navigate complex loss surfaces more effectively. Upon completion, you will understand how gradients are calculated and propagated backward through the network and how advanced optimizers refine the training process.
4.1 Calculating Gradients: The Chain Rule
4.2 Computational Graphs
4.3 The Backpropagation Algorithm Explained
4.4 Forward Pass vs. Backward Pass
4.5 Gradient Descent with Momentum
4.6 RMSprop Optimizer
4.7 Adam Optimizer
4.8 Choosing an Optimization Algorithm
4.9 Hands-on Practical: Backpropagation Step-by-Step
© 2025 ApX Machine Learning