Okay, your deep learning model is now defined with its layers and activation functions. You've also compiled it, specifying the loss function to measure error and the optimizer (like Adam or SGD) that will guide the learning process by updating the model's weights. The significant next step is to actually train the model. This involves feeding it the prepared training data and letting the optimizer adjust the weights iteratively to minimize the chosen loss function.
Training a neural network is fundamentally an iterative process. We don't just show the model the data once. Instead, we repeatedly expose it to the data, allowing it to gradually learn the underlying patterns. Each iteration involves several steps:
This cycle repeats for many batches of data.
Deep learning frameworks provide convenient ways to manage this training loop. In Keras, this is often done using a single method called fit
. In PyTorch, you typically write the loop explicitly, offering more fine-grained control. Regardless of the specific implementation, the core concepts remain the same, and you'll need to specify several important parameters:
This is the input data (X_train
) and corresponding target labels (y_train
) that the model will learn from. We assume this data has already been preprocessed (e.g., scaled, reshaped) as discussed previously.
An epoch represents one complete pass through the entire training dataset. If your dataset has 10,000 samples and you train for 10 epochs, the model will see each sample 10 times during training (though likely in different batches and orders).
Choosing the number of epochs is important:
Instead of processing the entire dataset at once for each weight update (which can be computationally infeasible for large datasets), we typically divide the training data into smaller chunks called mini-batches. The batch size defines how many training samples are processed in each forward/backward pass before the model's weights are updated.
The choice of batch size affects:
An epoch involves processing the entire dataset, typically split into mini-batches. For each batch, the model performs a forward pass, calculates loss, performs a backward pass (backpropagation), and updates its weights via the optimizer. This cycle repeats for all batches within the epoch, and then for multiple epochs.
Let's see how this looks in PyTorch. Assume you have your model
, criterion
(loss function, e.g., nn.CrossEntropyLoss
), optimizer
(e.g., optim.Adam
), and a train_loader
(a DataLoader
that provides batches of data).
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
# Assume these are defined elsewhere:
# model: Your neural network model (subclass of nn.Module)
# criterion: Your loss function (e.g., nn.CrossEntropyLoss())
# optimizer: Your optimizer (e.g., optim.Adam(model.parameters(), lr=0.001))
# X_train_tensor, y_train_tensor: Your training data as PyTorch tensors
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # To run on GPU if possible
# --- Hyperparameters ---
BATCH_SIZE = 64
NUM_EPOCHS = 10
# --- Prepare DataLoader ---
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
# shuffle=True is important for training to ensure batches are different each epoch
train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
# Move model to the correct device (CPU or GPU)
# model.to(device)
# --- Training Loop ---
model.train() # Set the model to training mode (important for layers like Dropout, BatchNorm)
print("Starting Training...")
for epoch in range(NUM_EPOCHS):
running_loss = 0.0
num_batches = len(train_loader)
for i, batch in enumerate(train_loader):
# 1. Get data from the batch and move to device
# inputs, labels = batch[0].to(device), batch[1].to(device)
inputs, labels = batch # Assuming data is already on the correct device or CPU for simplicity
# 2. Zero the parameter gradients (essential before backward pass)
optimizer.zero_grad()
# 3. Forward pass: Compute predictions
outputs = model(inputs)
# 4. Calculate loss
loss = criterion(outputs, labels)
# 5. Backward pass: Compute gradients
loss.backward()
# 6. Optimize: Update weights based on gradients
optimizer.step()
# Accumulate loss for reporting
running_loss += loss.item() # .item() gets the scalar value from the loss tensor
# Optional: Print progress periodically
if (i + 1) % 100 == 0 or (i + 1) == num_batches: # Print every 100 mini-batches or at the end of epoch
print(f'Epoch [{epoch + 1}/{NUM_EPOCHS}], Batch [{i + 1}/{num_batches}], Loss: {loss.item():.4f}')
# Note: For a running average loss: {running_loss / (i + 1):.4f}
epoch_loss = running_loss / num_batches
print(f'Epoch [{epoch + 1}/{NUM_EPOCHS}] completed. Average Loss: {epoch_loss:.4f}')
print('Finished Training')
# --- Keras Equivalent (Conceptual) ---
# For comparison, the Keras equivalent encapsulates this loop:
# history = model.fit(X_train_tensor.numpy(), y_train_tensor.numpy(),
# epochs=NUM_EPOCHS,
# batch_size=BATCH_SIZE,
# shuffle=True,
# verbose=2) # verbose controls how much info is printed
# print('Finished Training')
This PyTorch loop explicitly performs the steps outlined earlier: zeroing gradients, forward pass, loss calculation, backward pass, and optimizer step for each batch. The outer loop iterates through the specified number of epochs. Setting model.train()
is important as some layers behave differently during training and evaluation. Keras abstracts this loop into the model.fit()
call, managing the batch iteration, shuffling, and updates internally based on the epochs
and batch_size
arguments you provide.
Executing this training loop (either explicitly or via a method like fit
) is where the learning happens. The model's parameters are adjusted based on the training data, aiming to minimize the loss function. However, simply running the loop isn't enough. We need to observe how the training is progressing to make informed decisions. The next section will cover how to monitor metrics like loss and accuracy during training to understand model behavior and diagnose potential problems.
© 2025 ApX Machine Learning