One of the most straightforward and frequently used methods for adjusting the learning rate during training is the Step Decay schedule. The central idea is to start with a relatively high learning rate and then reduce it by a certain factor at predefined intervals (epochs or steps). This approach allows the model to make substantial progress early in the training when parameters are likely far from their optimal values, and then take smaller, more careful steps as it gets closer to a minimum, helping to stabilize convergence and fine-tune the weights.
Imagine starting a search in a large, open field (the loss landscape). Initially, you can take large strides to cover ground quickly. As you get closer to where you think the target is, you shorten your steps to search more carefully in that local area. Step decay mimics this intuition.
The schedule operates based on a few parameters:
The learning rate αt at epoch t can be defined based on the number of completed steps. If we let k=⌊t/S⌋ be the number of times the learning rate has been decayed by epoch t, then the learning rate is:
αt=α0×γk=α0×γ⌊t/S⌋For example, if α0=0.01, γ=0.1, and S=10 epochs:
The learning rate remains constant between the steps, dropping instantaneously when an epoch threshold is reached.
The following chart illustrates a typical step decay schedule over 50 epochs, with an initial learning rate of 0.01, a decay factor of 0.5, and a step size of 15 epochs.
Learning rate schedule showing step decay with α0=0.01, γ=0.5, and S=15. The learning rate is halved every 15 epochs. Note the logarithmic y-axis.
The effectiveness of step decay relies on choosing appropriate values for α0, γ, and S. These are hyperparameters that often need tuning:
While simple, step decay requires careful manual tuning of the schedule (when and how much to drop the rate). If the drops happen too early or too late, or if the factor is too large or too small, it might hinder convergence.
Most deep learning frameworks provide convenient ways to implement step decay. In PyTorch, you can use torch.optim.lr_scheduler.StepLR
:
import torch
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torch.nn import Linear # Example Layer
# Assume 'model' is your defined neural network
model = Linear(10, 2) # A simple example model part
# Choose an optimizer (e.g., Adam or SGD)
optimizer = optim.Adam(model.parameters(), lr=0.01) # Initial LR = 0.01
# Define the StepLR scheduler
# Drop LR by factor 0.1 every 10 epochs
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
# --- Inside your training loop ---
num_epochs = 30
for epoch in range(num_epochs):
# model.train()
# ... training forward pass, loss calculation, backward pass ...
optimizer.step() # Update weights based on current LR
# Update the learning rate
scheduler.step()
# Optional: Print current learning rate
# current_lr = scheduler.get_last_lr()[0]
# print(f"Epoch {epoch+1}, Learning Rate: {current_lr}")
# ... validation loop ...
# Example Output (LR changes):
# Epoch 1, Learning Rate: 0.01
# ...
# Epoch 10, Learning Rate: 0.01 (LR decreases *after* epoch 10 step)
# Epoch 11, Learning Rate: 0.001
# ...
# Epoch 20, Learning Rate: 0.001 (LR decreases *after* epoch 20 step)
# Epoch 21, Learning Rate: 0.0001
# ...
In this PyTorch example, StepLR
takes the optimizer, the step_size
(in epochs), and the gamma
factor as arguments. The scheduler.step()
call is typically made once per epoch, after optimizer.step()
, to update the learning rate for the next epoch based on the schedule.
Step decay provides a simple, interpretable way to adjust the learning rate. While more sophisticated schedules exist, its effectiveness and ease of implementation make it a valuable tool in the deep learning practitioner's repertoire.
© 2025 ApX Machine Learning