Optimization algorithms form the computational core of training machine learning models. Before addressing sophisticated methods, this chapter revisits the essential principles upon which they are built. We will start by reviewing standard first-order optimization algorithms like Stochastic Gradient Descent (SGD), Momentum, and Nesterov Accelerated Gradient (NAG), establishing a baseline for comparison.
Understanding the mathematical properties of the objective function is fundamental. We will examine the concept of convexity, where for a function f and any x,y in its domain and λ∈[0,1], the condition f(λx+(1−λ)y)≤λf(x)+(1−λ)f(y) holds, and discuss why this property simplifies optimization. We will also explore the typical geometry of loss surfaces encountered in machine learning, especially in high-dimensional spaces.
Key metrics for evaluating optimizers will be introduced through the basics of convergence analysis, looking at different rates of convergence. We will identify common difficulties that arise, particularly in the non-convex optimization problems typical of deep learning, such as local minima and saddle points. Finally, practical considerations regarding numerical stability and the potential impact of floating-point arithmetic will be discussed. The chapter concludes with exercises focused on analyzing the behavior of these foundational algorithms.
1.1 Revisiting Gradient Descent Variants
1.2 The Role of Convexity
1.3 Understanding Loss Surfaces
1.4 Convergence Analysis Fundamentals
1.5 Challenges in Non-Convex Optimization
1.6 Numerical Stability Considerations
1.7 Practice: Analyzing Convergence Behavior
© 2025 ApX Machine Learning