Having explored ways to control model complexity through regularization, we now turn our attention to the process of finding the optimal parameters for our deep learning models. This process, known as optimization, is central to training neural networks effectively. Standard gradient descent provides the theoretical foundation, but its application to large datasets presents practical challenges.
This chapter introduces the foundational optimization algorithms used in deep learning. We will begin by reviewing standard gradient descent and discussing its limitations. You will then learn about:
By the end of this chapter, you will understand the mechanics behind these core algorithms, their respective advantages and disadvantages, and how they address the challenges of navigating complex loss surfaces during model training. We will also implement and compare these optimizers in practice.
5.1 Revisiting Gradient Descent
5.2 Challenges with Standard Gradient Descent
5.3 Stochastic Gradient Descent (SGD)
5.4 Mini-batch Gradient Descent
5.5 SGD Challenges: Noise and Local Minima
5.6 SGD with Momentum: Accelerating Convergence
5.7 Nesterov Accelerated Gradient (NAG)
5.8 Implementing SGD and Momentum
5.9 Practice: Comparing GD, SGD, and Momentum
© 2025 ApX Machine Learning