When training a neural network, especially with large datasets, processing the entire dataset at once to compute the loss and update the weights can be computationally expensive and memory-intensive. Furthermore, using the entire dataset for each weight update (as in traditional batch gradient descent) might lead to slower convergence or getting stuck in local minima. To address this, the training process is typically broken down into smaller, manageable steps using the concepts of batches and epochs.
An epoch represents one complete pass through the entire training dataset. If your dataset contains 10,000 images, one epoch concludes after the model has seen and learned from all 10,000 images exactly once.
Training a deep learning model usually requires multiple epochs. Why? Because a single pass is rarely enough for the model's weights to converge to optimal values. The network needs to see the data multiple times to learn the underlying patterns effectively. Think of it like studying for an exam: you wouldn't just read the textbook once; you'd review the material multiple times (multiple epochs) to reinforce your understanding.
The number of epochs is a hyperparameter you set before training begins. Choosing the right number is important:
Instead of processing the entire dataset in one go during an epoch, we divide the dataset into smaller subsets called batches. The batch size determines how many training examples are included in each batch.
During each epoch, the training data is shuffled (usually) and then divided into these batches. The model processes one batch at a time:
This process repeats for all batches within the epoch. Each time the model processes a batch and updates its weights, it's called one iteration or step.
For example, if you have a dataset with 2,000 samples and you set the batch size to 100, then one epoch will consist of: Number of Iterations per Epoch=Batch SizeTotal Training Samples=1002000=20 iterations
The model's weights will be updated 20 times during one epoch.
An illustration showing how a full training dataset is divided into batches within a single epoch. Each batch is processed sequentially, leading to a weight update (iteration).
Using batches (often called mini-batch gradient descent) offers several advantages over processing the entire dataset at once (batch gradient descent) or one sample at a time (stochastic gradient descent, or SGD, though often SGD refers to mini-batch SGD in practice):
The batch size is another significant hyperparameter. Common batch sizes are powers of 2 (e.g., 32, 64, 128, 256) due to hardware memory alignment optimizations, but other values can work too. The choice involves trade-offs:
In practice, a batch size like 32 or 64 is often a good starting point. You might experiment with different sizes as part of hyperparameter tuning.
When you call the fit()
method in Keras, you specify these parameters:
# Assume 'model' is compiled and 'x_train', 'y_train' are NumPy arrays
history = model.fit(
x_train,
y_train,
batch_size=32, # Number of samples per gradient update
epochs=10, # Number of times to iterate over the entire dataset
validation_data=(x_val, y_val) # Data to evaluate loss and metrics on at the end of each epoch
)
Here, the model will train for 10 epochs. In each epoch, it will process the x_train
data in batches of 32 samples, updating the weights after each batch. The number of iterations per epoch would be len(x_train) / 32
.
Understanding batches and epochs is fundamental to controlling the training process. They dictate how the model learns from the data over time, influencing training stability, speed, memory usage, and ultimately, the model's generalization performance.
© 2025 ApX Machine Learning