Having established the structure of neural networks with neurons, layers, and activation functions, the next logical step is to understand how these networks learn from data. This involves adjusting the network's internal parameters – its weights and biases – to improve prediction accuracy. This chapter focuses on the mechanisms behind this learning process.
First, we need a way to measure how well (or poorly) the network is performing. This is accomplished using loss functions, which quantify the difference between the network's predictions and the actual target values. We will cover common loss functions for both regression tasks, like Mean Squared Error (MSE) and Mean Absolute Error (MAE), and classification tasks, such as Cross-Entropy.
With a measure of error defined, the objective is to minimize this error by systematically adjusting the network's parameters. The main algorithm for this optimization is gradient descent. We will explain its mechanics, including how it uses gradients to iteratively update weights. Key concepts like the learning rate will be discussed, along with practical variants like Stochastic Gradient Descent (SGD) and mini-batch gradient descent, which are widely used in practice. We will also briefly cover some challenges associated with the gradient descent process. By the end of this chapter, you will understand how a neural network refines itself based on the data it sees.
3.1 Measuring Performance: Loss Functions
3.2 Common Loss Functions for Regression (MSE, MAE)
3.3 Common Loss Functions for Classification (Cross-Entropy)
3.4 Optimization: Finding the Best Weights
3.5 Gradient Descent Algorithm
3.6 Learning Rate
3.7 Stochastic Gradient Descent (SGD)
3.8 Challenges with Gradient Descent
3.9 Hands-on Practical: Visualizing Gradient Descent
© 2025 ApX Machine Learning