.grad)When computing gradients for a scalar tensor (often the loss) with respect to other tensors in the computation graph, PyTorch uses the backward() method. This method triggers the gradient calculation, but it does not return the gradients directly. Instead, PyTorch stores the computed gradients in a special attribute of the tensors themselves: the .grad attribute.
This attribute is primarily populated for the leaf tensors in the computation graph for which you explicitly requested gradient tracking by setting requires_grad=True. Remember, leaf tensors are typically the ones you created directly, like model parameters or inputs, as opposed to intermediate tensors resulting from operations.
The .grad attribute holds a tensor of the same shape as the original tensor it belongs to. Each element in the .grad tensor represents the partial derivative of the scalar (on which backward() was called) with respect to the corresponding element in the original tensor. If is the scalar loss and is a tensor parameter, then after calling L.backward(), the attribute w.grad will contain the tensor representing .
Let's illustrate this with a simple example:
import torch
# Create input tensors that require gradients
x = torch.tensor(2.0, requires_grad=True)
w = torch.tensor(3.0, requires_grad=True)
b = torch.tensor(1.0, requires_grad=True)
# Define a simple computation
y = w * x + b # y = 3.0 * 2.0 + 1.0 = 7.0
# Compute gradients
y.backward()
# Access the gradients stored in the .grad attribute
print(f"Gradient of y with respect to x (dy/dx): {x.grad}")
print(f"Gradient of y with respect to w (dy/dw): {w.grad}")
print(f"Gradient of y with respect to b (dy/db): {b.grad}")
# Create a tensor that does NOT require gradients
z = torch.tensor(4.0, requires_grad=False)
print(f"Gradient for tensor z (requires_grad=False): {z.grad}")
Expected Output:
Gradient of y with respect to x (dy/dx): 3.0
Gradient of y with respect to w (dy/dw): 2.0
Gradient of y with respect to b (dy/db): 1.0
Gradient for tensor z (requires_grad=False): None
In this example:
x, w, and b with requires_grad=True, marking them as leaf nodes whose gradients we want.y.backward(). Autograd traversed the graph backward from y to compute the gradients.
x.grad, w.grad, and b.grad. Accessing these attributes reveals the calculated tensor values.z was created with requires_grad=False, so it was not part of the gradient computation tracked by Autograd, and its .grad attribute remains None.It's important to remember that gradients accumulate by default. If you call backward() multiple times on potentially different parts of your graph (or the same part) without clearing the gradients, the newly computed gradients will be added to the values already present in the .grad attribute. This behavior is intentional and useful for scenarios like gradient accumulation across mini-batches, but in typical training loops, you need to explicitly zero the gradients before each backpropagation step. This is commonly done using optimizer.zero_grad(), which we will discuss further when constructing training loops.
For now, the main takeaway is that after loss.backward(), the gradients you need for updating your model parameters are available directly within the .grad attribute of those parameter tensors.
Was this section helpful?
requires_grad, backward(), and how gradients are stored in the .grad attribute.© 2026 ApX Machine LearningEngineered with