To understand how PyTorch's Autograd automatically computes gradients, we first need to look at the underlying mechanism: computation graphs. Every time you perform an operation involving tensors for which gradients are required (we'll see how to specify this soon), PyTorch dynamically builds a graph representing the sequence of computations.
Think of this graph as a directed acyclic graph (DAG) where:
Consider a simple calculation:
import torch
# Tensors that require gradients
x = torch.tensor(2.0, requires_grad=True)
w = torch.tensor(3.0, requires_grad=True)
b = torch.tensor(1.0, requires_grad=True)
# Operations
y = w * x # Intermediate result 'y'
z = y + b # Final result 'z'
print(f"Result z: {z}")
Output:
Result z: 7.0
As these lines execute, PyTorch constructs a graph behind the scenes. It looks something like this conceptually:
Conceptual representation of the computation graph for z=(w∗x)+b. Blue boxes are input tensors, yellow ellipses are operations, and green boxes are output/intermediate tensors. Edges show data flow and dependencies. The
grad_fn
attribute on tensors resulting from operations points back to the function that created them.
A significant characteristic of PyTorch's computation graphs is their dynamic nature. Unlike frameworks that require you to define the entire graph structure before running computations, PyTorch builds the graph on-the-fly as your Python code executes.
if
conditions or for
loops) to directly influence the graph structure from one iteration to the next. If your model's architecture needs to change based on the input data during the forward pass, PyTorch handles this naturally.Forward Pass: When you execute tensor operations (like y = w * x
), you are performing the forward pass. PyTorch records the operations and the tensors involved, building the graph. Tensors that result from operations tracked by Autograd will have a grad_fn
attribute (like y
and z
in the example). This attribute references the function that created the tensor and holds references to its inputs, forming the backward links in the graph. User-created tensors (like x
, w
, b
) usually have grad_fn=None
.
Backward Pass: When you later call .backward()
on a scalar tensor (typically the final loss value), Autograd traverses this graph backward from that node. It uses the chain rule of calculus, guided by the grad_fn
at each step, to compute the gradients of the scalar output with respect to the tensors that were initially marked with requires_grad=True
(usually model parameters or inputs).
In the context of Autograd:
torch.tensor()
, torch.randn()
) that have requires_grad=True
. Model parameters (nn.Parameter
, which we'll see later) are also leaf tensors.y
and z
above). They have a grad_fn
associated with them.By default, gradients computed during the .backward()
call are only retained and accumulated in the .grad
attribute of the leaf tensors that have requires_grad=True
. The gradients for intermediate tensors are computed but generally discarded after use to save memory, unless explicitly requested otherwise (e.g., using .retain_grad()
).
Understanding this graph structure is fundamental to grasping how Autograd operates. It connects the forward computations you define in your model directly to the gradient calculations needed for optimization. Next, we'll examine how to explicitly control gradient tracking using requires_grad
.
© 2025 ApX Machine Learning