After triggering the gradient computation using the backward()
method on a scalar tensor (often the loss), PyTorch calculates the gradients of that scalar with respect to various tensors in the computation graph. However, backward()
itself doesn't return these gradients directly. Instead, PyTorch stores the computed gradients in a special attribute of the tensors themselves: the .grad
attribute.
This attribute is primarily populated for the leaf tensors in the computation graph for which you explicitly requested gradient tracking by setting requires_grad=True
. Remember, leaf tensors are typically the ones you created directly, like model parameters or inputs, as opposed to intermediate tensors resulting from operations.
The .grad
attribute holds a tensor of the same shape as the original tensor it belongs to. Each element in the .grad
tensor represents the partial derivative of the scalar (on which backward()
was called) with respect to the corresponding element in the original tensor. If L is the scalar loss and w is a tensor parameter, then after calling L.backward()
, the attribute w.grad
will contain the tensor representing ∂w∂L.
Let's illustrate this with a simple example:
import torch
# Create input tensors that require gradients
x = torch.tensor(2.0, requires_grad=True)
w = torch.tensor(3.0, requires_grad=True)
b = torch.tensor(1.0, requires_grad=True)
# Define a simple computation
y = w * x + b # y = 3.0 * 2.0 + 1.0 = 7.0
# Compute gradients
y.backward()
# Access the gradients stored in the .grad attribute
print(f"Gradient of y with respect to x (dy/dx): {x.grad}")
print(f"Gradient of y with respect to w (dy/dw): {w.grad}")
print(f"Gradient of y with respect to b (dy/db): {b.grad}")
# Create a tensor that does NOT require gradients
z = torch.tensor(4.0, requires_grad=False)
print(f"Gradient for tensor z (requires_grad=False): {z.grad}")
Expected Output:
Gradient of y with respect to x (dy/dx): 3.0
Gradient of y with respect to w (dy/dw): 2.0
Gradient of y with respect to b (dy/db): 1.0
Gradient for tensor z (requires_grad=False): None
In this example:
x
, w
, and b
with requires_grad=True
, marking them as leaf nodes whose gradients we want.y.backward()
. Autograd traversed the graph backward from y
to compute the gradients.
x.grad
, w.grad
, and b.grad
. Accessing these attributes reveals the calculated tensor values.z
was created with requires_grad=False
, so it was not part of the gradient computation tracked by Autograd, and its .grad
attribute remains None
.It's important to remember that gradients accumulate by default. If you call backward()
multiple times on potentially different parts of your graph (or the same part) without clearing the gradients, the newly computed gradients will be added to the values already present in the .grad
attribute. This behavior is intentional and useful for scenarios like gradient accumulation across mini-batches, but in typical training loops, you need to explicitly zero the gradients before each backpropagation step. This is commonly done using optimizer.zero_grad()
, which we will discuss further when constructing training loops.
For now, the main takeaway is that after loss.backward()
, the gradients you need for updating your model parameters are available directly within the .grad
attribute of those parameter tensors.
© 2025 ApX Machine Learning