All Courses

Tensors and Gradient Calculation (`requires_grad`)

As we discussed in the chapter introduction, the foundation of training neural networks lies in calculating the gradient of the loss function with respect to the model's parameters. PyTorch's Autograd engine handles this complex task automatically. But how does Autograd know which calculations need to be tracked for differentiation? The answer lies in a specific attribute of PyTorch tensors: requires_grad.

The `requires_grad` Attribute

Every PyTorch tensor possesses a boolean attribute called requires_grad. This attribute acts as a flag, signaling to Autograd whether operations involving this tensor should be recorded for potential gradient computation later.

By default, when you create a tensor, its requires_grad attribute is set to False.

import torch

# Default behavior: requires_grad is False
x = torch.tensor([1.0, 2.0, 3.0])
print(f"Tensor x: {x}")
print(f"x.requires_grad: {x.requires_grad}")

# Create another tensor explicitly setting requires_grad to False
y = torch.tensor([4.0, 5.0, 6.0], requires_grad=False)
print(f"\nTensor y: {y}")
print(f"y.requires_grad: {y.requires_grad}")

This default behavior is sensible for efficiency. Many tensors in a typical workflow don't need gradients. For instance, input data or target labels are usually fixed and don't require gradient computation with respect to themselves. Tracking operations unnecessarily would consume extra memory and computation.

Enabling Gradient Tracking

To instruct PyTorch to track operations and prepare for gradient computation for a specific tensor, you set its requires_grad attribute to True. There are two primary ways to do this:

During Tensor Creation: Pass requires_grad=True as an argument to the tensor creation function.

# Enable gradient tracking at creation time
w = torch.tensor([0.5, -1.0], requires_grad=True)
print(f"Tensor w: {w}")
print(f"w.requires_grad: {w.requires_grad}")

After Tensor Creation (In-place): Use the in-place method .requires_grad_(True) on an existing tensor.

b = torch.tensor([0.1])
print(f"Tensor b (before): {b}")
print(f"b.requires_grad (before): {b.requires_grad}")

# Enable gradient tracking after creation
b.requires_grad_(True)
print(f"\nTensor b (after): {b}")
print(f"b.requires_grad (after): {b.requires_grad}")

Important Note: Gradient computation is typically only meaningful for floating-point tensors (like torch.float32 or torch.float64). Derivatives involve continuous changes, which aligns with floating-point types. Attempting to set requires_grad=True on integer tensors will usually result in an error or may behave unexpectedly, as gradients are not defined for discrete values in the same way. PyTorch will often raise a RuntimeError if you try to compute gradients for integer tensors directly involved in tracked operations.

# Attempting requires_grad on an integer tensor
try:
    int_tensor = torch.tensor([1, 2], dtype=torch.int64, requires_grad=True)
    # This line might not error immediately, but subsequent backward() calls involving it would.
    print(f"Integer tensor created with requires_grad=True: {int_tensor.requires_grad}")
    # Let's try a simple operation that might lead to issues later
    result = int_tensor * 2.0 # Multiply by float to see if it causes issues
    print(f"Result requires_grad: {result.requires_grad}")
    # result.backward() # This would likely fail if we tried to backpropagate
except RuntimeError as e:
    print(f"\nError setting requires_grad on integer tensor: {e}")

# Best practice: Use float tensors for parameters/computations needing gradients
float_tensor = torch.tensor([1.0, 2.0], requires_grad=True)
print(f"\nFloat tensor created with requires_grad=True: {float_tensor.requires_grad}")

Propagation of `requires_grad`

Crucially, the requires_grad status propagates through operations. If any input tensor participating in an operation has requires_grad=True, the output tensor resulting from that operation will automatically have requires_grad=True. This ensures that the entire chain of calculations involving parameters (which typically have requires_grad=True) is tracked.

Let's illustrate this:

# Define tensors: x (input), w (weight), b (bias)
x = torch.tensor([1.0, 2.0]) # Input data, gradients not needed
w = torch.tensor([0.5, -1.0], requires_grad=True) # Weight parameter, track gradients
b = torch.tensor([0.1], requires_grad=True) # Bias parameter, track gradients

print(f"x requires_grad: {x.requires_grad}")
print(f"w requires_grad: {w.requires_grad}")
print(f"b requires_grad: {b.requires_grad}")

# Perform an operation: y = w * x + b
# Note: PyTorch handles broadcasting for b
intermediate = w * x
print(f"\nintermediate (w * x) requires_grad: {intermediate.requires_grad}")

y = intermediate + b
print(f"y requires_grad: {y.requires_grad}")

Notice that even though x did not require gradients, because w required gradients, the result of w * x (intermediate) also requires gradients. Subsequently, since intermediate required gradients (and b also did), the final output y also has requires_grad=True.

The `.grad_fn` Attribute

This propagation is intrinsically linked to how PyTorch builds the computation graph. When a new tensor is created by an operation, and its requires_grad is True, PyTorch attaches a .grad_fn attribute to this new tensor. This attribute references the function (like AddBackward0 or MulBackward0) that performed the operation and knows how to compute the corresponding gradients during the backward pass.

Tensors created directly by the user (like our x, w, and b examples above) are considered "leaf" tensors in the graph. If they have requires_grad=True, their .grad_fn is None because they weren't created by a tracked operation within the graph. Tensors resulting from operations on tensors requiring gradients are "non-leaf" tensors and will have a .grad_fn.

Let's inspect the .grad_fn from our previous example:

print(f"\nx.grad_fn: {x.grad_fn}")
print(f"w.grad_fn: {w.grad_fn}")
print(f"b.grad_fn: {b.grad_fn}")
print(f"intermediate.grad_fn: {intermediate.grad_fn}") # Result of multiplication
print(f"y.grad_fn: {y.grad_fn}") # Result of addition

You can see that x, w, and b (our leaf tensors) have grad_fn=None. In contrast, intermediate has a MulBackward0 function, and y has an AddBackward0 function, indicating the operations that created them. This chain of grad_fn references is the dynamic computation graph that Autograd uses.

A simplified view of the computation graph for y = w * x + b. Tensors requiring gradients are highlighted in blue. Notice how operations (*, +) create new tensors (intermediate, y) which reference the operation via grad_fn if gradient tracking is enabled through their inputs.

By setting requires_grad=True on the tensors we want to optimize (typically model parameters like weights w and biases b), we enable Autograd to build this graph and trace the computations back from the final output (usually the loss) to these parameters, preparing everything for the gradient calculation step using .backward(), which we will cover next.

Was this section helpful?

Tensors and Gradient Calculation (requires_grad)

The requires_grad Attribute

Enabling Gradient Tracking

Propagation of requires_grad

The .grad_fn Attribute

Tensors and Gradient Calculation (`requires_grad`)

The `requires_grad` Attribute

Propagation of `requires_grad`

The `.grad_fn` Attribute