While creating and manipulating tensors in PyTorch often feels intuitive, understanding their internal structure is beneficial for writing efficient code, debugging complex behaviors, and building custom operations. A torch.Tensor
is more than just a multi-dimensional array; it's a sophisticated object containing metadata that defines how raw numerical data stored in memory is interpreted.
At its core, every PyTorch tensor holds a reference to a torch.Storage
object. Think of torch.Storage
as a contiguous, one-dimensional array of numerical data of a specific type (e.g., float32
, int64
). The Tensor
object itself doesn't directly contain the numbers but holds metadata describing how to view the data within its associated Storage
.
This separation is important because multiple tensors can share the same underlying Storage
. Operations like slicing, transposing, or reshaping often create new Tensor
objects with different metadata but point to the same memory block managed by the Storage
. This makes these operations very memory-efficient as they typically don't involve copying data.
import torch
# Create a tensor
x = torch.arange(12, dtype=torch.float32)
print(f"Original tensor x: {x}")
# Storage is a 1D array of 12 floats
print(f"Storage elements: {x.storage().tolist()}")
print(f"Storage type: {x.storage().dtype}")
print(f"Storage size: {len(x.storage())}")
# Create a view by reshaping
y = x.view(3, 4)
print(f"\nReshaped tensor y:\n{y}")
# y has different shape/strides but shares the same storage
print(f"Does y share storage with x? {y.storage().data_ptr() == x.storage().data_ptr()}")
# Modifying the view affects the original (and vice versa)
y[0, 0] = 99.0
print(f"\nModified y:\n{y}")
print(f"Original x after modifying y: {x}")
In the example above, x
and y
are distinct Tensor
objects, but because y
is a view created by reshape
, they share the same underlying Storage
. Modifying an element in y
also changes the corresponding element visible through x
.
Besides the reference to its Storage
, a Tensor
object maintains several pieces of metadata that define its properties and interpretation of the data:
Device (device
): Specifies where the tensor's data resides, either on the CPU (torch.device('cpu')
) or a specific GPU (torch.device('cuda:0')
). Data must typically be on the same device for operations between tensors. Moving data between devices (e.g., using .to(device)
) involves memory copies and can be a performance consideration.
Data Type (dtype
): Defines the numerical type of the elements in the tensor, such as torch.float32
, torch.int64
, torch.bool
. Operations usually require tensors to have compatible dtypes, and the choice of dtype significantly impacts memory usage and numerical precision.
Shape (shape
or size()
): A tuple representing the dimensions of the tensor. For example, a 3x4 matrix has a shape of (3, 4)
.
Storage Offset (storage_offset()
): An integer indicating the index in the underlying Storage
where this tensor's data begins. For a tensor created directly (not from a view), this is usually 0. Slices might have a non-zero offset.
Stride (stride()
): This is perhaps the most critical piece of metadata for understanding memory layout. The stride is a tuple where the i-th element specifies the jump in memory (number of elements in the Storage
) needed to move one step along the i-th dimension of the tensor.
Consider a 3x4 tensor t
:
t = torch.arange(12, dtype=torch.float32).view(3, 4)
print(f"Tensor t:\n{t}")
print(f"Shape: {t.shape}")
print(f"Stride: {t.stride()}")
Output:
Tensor t:
tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
Shape: torch.Size([3, 4])
Stride: (4, 1)
The stride (4, 1)
means:
Storage
. (e.g., from element 0 to element 4).Storage
. (e.g., from element 0 to element 1).The stride determines how the multi-dimensional tensor maps onto the linear Storage
.
A tensor is considered contiguous in memory if its elements are laid out in the Storage
in the same order as a standard C-style (row-major) traversal. For a contiguous tensor, the stride typically follows a pattern where the stride for the last dimension is 1, the stride for the second-to-last dimension is the size of the last dimension, and so on. For our 3x4 tensor t
above, the stride is (4, 1)
, which matches this pattern (stride[1] == 1
, stride[0] == shape[1] == 4
), so it is contiguous.
print(f"Is t contiguous? {t.is_contiguous()}") # Output: True
However, operations like transposing can create non-contiguous tensors (views).
t_transposed = t.t() # Transpose operation
print(f"\nTransposed tensor t_transposed:\n{t_transposed}")
print(f"Shape: {t_transposed.shape}")
print(f"Stride: {t_transposed.stride()}")
print(f"Is t_transposed contiguous? {t_transposed.is_contiguous()}") # Output: False
print(f"Does t_transposed share storage with t? {t_transposed.storage().data_ptr() == t.storage().data_ptr()}") # Output: True
Output:
Transposed tensor t_transposed:
tensor([[ 0., 4., 8.],
[ 1., 5., 9.],
[ 2., 6., 10.],
[ 3., 7., 11.]])
Shape: torch.Size([4, 3])
Stride: (1, 4)
Is t_transposed contiguous? False
Does t_transposed share storage with t? True
Notice that t_transposed
has shape (4, 3)
but its stride is (1, 4)
. To move along dimension 0 (down a row in the transposed view), we jump 1 element in the original storage. To move along dimension 1 (across a column), we jump 4 elements. This layout is not C-contiguous.
Why does contiguity matter?
view()
, require the tensor to be contiguous. If you try to use view()
on a non-contiguous tensor like t_transposed
, you'll get an error. In such cases, you often need to use reshape()
, which might return a view if possible but will return a copy if necessary to satisfy the shape change. Alternatively, you can explicitly create a contiguous copy using the .contiguous()
method.# This would raise a RuntimeError because t_transposed is not contiguous
# flat_view = t_transposed.view(-1)
# .contiguous() creates a new tensor with a contiguous memory layout if needed
t_contiguous_copy = t_transposed.contiguous()
print(f"\nIs contiguous copy contiguous? {t_contiguous_copy.is_contiguous()}") # Output: True
print(f"Stride of contiguous copy: {t_contiguous_copy.stride()}") # Output: (3, 1)
print(f"Storage shared? {t_contiguous_copy.storage().data_ptr() == t_transposed.storage().data_ptr()}") # Output: False (it's a copy)
# Now view works
flat_view = t_contiguous_copy.view(-1)
print(f"Flattened view: {flat_view}")
The following diagram illustrates how two tensors, T
(original 3x4) and T_transpose
(its transpose), might map their elements onto the same underlying 1D Storage block. Note how the strides dictate the different access patterns.
Relationship between Tensor metadata and the underlying Storage for a 3x4 tensor
T
and its transposeT_transpose
. Both Tensor objects point to the same Storage but interpret it differently based on their shape, stride, and offset. Arrows indicate how elements map from the Tensor view to the Storage index.
Understanding these implementation details, the distinction between Tensor
metadata and Storage
, the role of strides, and the concept of contiguity, provides a solid foundation for reasoning about memory usage, performance characteristics, and the behavior of various tensor operations in PyTorch. This knowledge becomes particularly useful when optimizing bottlenecks or interfacing with lower-level code.
© 2025 ApX Machine Learning