Before entering the iterative process of training, we need to prepare the core components: the model itself, a way to measure its error (the loss function), and a mechanism to update the model based on that error (the optimizer). This setup phase ensures all necessary pieces are initialized and ready for the training loop.
First, you need an instance of your neural network model. In Chapter 4, you learned how to define custom network architectures by subclassing torch.nn.Module
. Now, you simply create an object of that class:
# Assuming 'SimpleNet' is your custom nn.Module class defined earlier
model = SimpleNet(input_size=784, hidden_size=128, output_size=10)
print(model)
This creates the network structure, including all its layers and parameters (weights and biases). Initially, these parameters have random values (or values determined by specific initialization schemes if you implemented them).
Deep learning computations, especially training, are significantly faster on GPUs. PyTorch makes it straightforward to move your model to the appropriate device (CPU or GPU). It's good practice to define the target device early on and then consistently move both the model and the data to it.
import torch
# Determine the available device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
# Move the model to the chosen device
model.to(device)
Executing model.to(device)
modifies the model in place, moving all its parameters and buffers to the GPU memory if CUDA is available, otherwise keeping them on the CPU. Remember, any tensor involved in computations with the model (like input data) must also reside on the same device. We'll handle moving data tensors inside the training loop.
The loss function, often called the criterion, quantifies how far the model's predictions are from the actual target values. PyTorch provides numerous standard loss functions within the torch.nn
module. The choice depends heavily on the type of problem you are solving (e.g., regression, classification).
For a multi-class classification problem, nn.CrossEntropyLoss
is common. It combines nn.LogSoftmax
and nn.NLLLoss
(Negative Log Likelihood Loss) in one efficient class.
# For multi-class classification
loss_fn = torch.nn.CrossEntropyLoss()
# For regression problems (predicting continuous values)
# loss_fn = torch.nn.MSELoss() # Mean Squared Error Loss
You instantiate the chosen loss function just like the model. This loss_fn
object will be called later within the training loop, typically taking the model's output and the ground truth labels as input to compute a scalar loss value.
The optimizer implements an algorithm (like Stochastic Gradient Descent or Adam) to adjust the model's parameters based on the gradients computed during backpropagation. The goal is to minimize the loss function. Optimizers are found in the torch.optim
package.
When initializing an optimizer, you must provide two essential arguments:
model.parameters()
, which returns an iterator over all trainable parameters in the model.lr
): This hyperparameter controls the step size for parameter updates. Finding a good learning rate is important for effective training. It often requires experimentation.import torch.optim as optim
# Using Stochastic Gradient Descent (SGD)
learning_rate = 0.01
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
# Alternatively, using the Adam optimizer
# optimizer = optim.Adam(model.parameters(), lr=0.001)
Here, we created an SGD optimizer instance. It now holds references to all of the model
's parameters and knows the learning rate to use when its step()
method is called later. Different optimizers might have additional hyperparameters (like momentum
for SGD, or betas
for Adam) that you can configure during initialization.
With the model instantiated and moved to the correct device, the loss function defined, and the optimizer configured to update the model's parameters, we have all the necessary components set up. We are now ready to proceed to the core of the training process: iterating through the data and performing the forward pass, loss calculation, backpropagation, and parameter updates within the training loop.
© 2025 ApX Machine Learning