Effectively managing where your computations run, whether on a CPU or one or more GPUs, is important for training deep learning models efficiently. If you've worked with TensorFlow, you're likely familiar with specifying devices using tf.device
or relying on Keras to handle placement. PyTorch offers a similar capability but with a more explicit approach to device control, which gives you fine-grained command over your hardware resources.
Before you can tell PyTorch where to run operations, you need to know what's available. PyTorch provides straightforward functions for this:
torch.cuda.is_available()
: Returns True
if a CUDA-enabled GPU is found and accessible by PyTorch, False
otherwise.torch.cuda.device_count()
: Returns the number of available GPUs.torch.cuda.get_device_name(i)
: Returns the name of the GPU at index i
(e.g., 'NVIDIA GeForce RTX 3090').A common practice is to check for GPU availability and select it if present, falling back to the CPU otherwise.
import torch
if torch.cuda.is_available():
device_name = torch.cuda.get_device_name(0)
print(f"GPU is available. Using device: {device_name}")
device = torch.device("cuda")
else:
print("GPU not available, using CPU instead.")
device = torch.device("cpu")
print(f"Selected device: {device}")
# Example output if GPU is available:
# GPU is available. Using device: NVIDIA GeForce RTX 3090
# Selected device: cuda
# Example output if GPU is not available:
# GPU not available, using CPU instead.
# Selected device: cpu
In PyTorch, devices are represented by torch.device
objects. You can create these objects by specifying the device type ('cpu' or 'cuda') and, optionally, an index for GPUs if you have multiple:
torch.device('cpu')
: Represents the system's CPU.torch.device('cuda')
: Represents the default GPU (equivalent to torch.device('cuda:0')
).torch.device('cuda:0')
: Represents the first GPU.torch.device('cuda:1')
: Represents the second GPU, and so on.Attempting to use torch.device('cuda')
when no GPU is available will result in an error. The conditional check shown above is the standard way to handle this.
Once you have a torch.device
object, you can move tensors to the specified device using the .to()
method. This method is out-of-place, meaning it returns a new tensor on the target device; it does not modify the original tensor unless you reassign it.
# Assume 'device' is defined as above (e.g., torch.device('cuda') or torch.device('cpu'))
# Create a tensor (defaults to CPU)
x_cpu = torch.randn(3, 3)
print(f"x_cpu device: {x_cpu.device}")
# Move it to the selected device
x_on_device = x_cpu.to(device)
print(f"x_on_device device: {x_on_device.device}")
# If 'device' is 'cuda', x_on_device will be on the GPU.
# If 'device' is 'cpu', x_on_device will remain on the CPU (or be copied if it was on GPU).
PyTorch also provides convenience methods:
tensor.cuda()
: Moves the tensor to the default GPU (equivalent to tensor.to(torch.device('cuda'))
).tensor.cpu()
: Moves the tensor to the CPU (equivalent to tensor.to(torch.device('cpu'))
).It's important to remember that operations between tensors generally require them to be on the same device. Attempting an operation on tensors residing on different devices (e.g., one on CPU, one on GPU) will raise a runtime error.
# if torch.cuda.is_available(): # Ensure this code only runs if a GPU is present
# a_cpu = torch.randn(2, 2)
# b_gpu = torch.randn(2, 2).cuda()
# try:
# c = a_cpu + b_gpu # This will cause an error
# except RuntimeError as e:
# print(f"Error: {e}")
# # To fix this, move a_cpu to the same device as b_gpu:
# a_gpu = a_cpu.cuda()
# c_gpu = a_gpu + b_gpu
# print(f"Sum on GPU: {c_gpu.device}")
# else:
# print("Skipping GPU-specific tensor operation example as no GPU is available.")
This explicit management of tensor locality contrasts with TensorFlow, where device placement for operations defined within a tf.device('/GPU:0')
block might handle tensor transfers more implicitly, or Keras layers ensure their internal operations use appropriately placed tensors. In PyTorch, you are more directly responsible for ensuring data is where it needs to be.
Similar to tensors, PyTorch models (subclasses of torch.nn.Module
) also need to be moved to the desired device. Calling the .to(device)
method on a model moves all of its parameters and buffers to that device.
import torch.nn as nn
# Define a simple model
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.linear = nn.Linear(10, 1)
def forward(self, x):
return self.linear(x)
model = SimpleNet()
print(f"Model parameters device (before move): {next(model.parameters()).device}")
# Move the model to the selected device
model.to(device)
print(f"Model parameters device (after move): {next(model.parameters()).device}")
# Now, any input data passed to model.forward() must also be on 'device'
# For example:
# dummy_input = torch.randn(5, 10).to(device)
# output = model(dummy_input)
# print(f"Output device: {output.device}")
When you call model.to(device)
, all parameters (weights and biases) and buffers within the model are transferred. It's a common practice to move the model to the target device once after its instantiation and before training begins. Subsequently, all input data fed to the model during the forward pass must also be on the same device.
A standard pattern in PyTorch for training loops involves:
device
(CPU or GPU) at the beginning of your script.device
using model.to(device)
.DataLoader
, moving the data tensors (inputs and labels) to the device
before passing them to the model or loss function.# Assume 'model' and 'device' are already defined and model is on 'device'
# Assume 'dataloader' provides batches of (inputs, labels)
# optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# criterion = nn.MSELoss() # Loss function
# for epoch in range(num_epochs):
# for inputs, labels in dataloader:
# # 1. Move data to the selected device
# inputs = inputs.to(device)
# labels = labels.to(device)
# # 2. Forward pass
# outputs = model(inputs)
# loss = criterion(outputs, labels) # Loss is also computed on 'device'
# # 3. Backward pass and optimization
# optimizer.zero_grad()
# loss.backward()
# optimizer.step()
# print(f"Epoch {epoch+1} completed.")
This ensures all computations related to the model's forward pass, loss calculation, and backpropagation occur on the designated device.
tf.device
In TensorFlow, you might use a context manager like with tf.device('/GPU:0'):
to suggest where operations and variables should be placed. The TensorFlow runtime then often manages the placement or necessary data transfers. While tf.config.set_visible_devices
offers more explicit control over which devices TensorFlow can see, the per-operation placement is often handled by these context managers or by the Keras API during model construction.
PyTorch, on the other hand, requires you to be more explicit:
tensor.to(device)
or model.to(device)
to move them.
Operations will then run on the device where their input tensors reside. This direct control can be very powerful, offering clarity about data location, but it also means you are responsible for these transfers.NumPy arrays are always CPU-bound. If you need to convert a PyTorch tensor that resides on a GPU to a NumPy array, you must first move it to the CPU:
gpu_tensor = torch.randn(2, 2).to(device) # Assuming 'device' is a GPU device
if gpu_tensor.is_cuda: # Check if the tensor is actually on CUDA
cpu_tensor = gpu_tensor.cpu()
numpy_array = cpu_tensor.numpy()
print("Converted GPU tensor to NumPy array via CPU.")
else: # Tensor is already on CPU
numpy_array = gpu_tensor.numpy()
print("Converted CPU tensor to NumPy array.")
# To go from NumPy to a PyTorch tensor on a specific device:
# new_tensor = torch.from_numpy(numpy_array).to(device)
Forgetting the .cpu()
call before .numpy()
on a GPU tensor will result in an error.
While GPUs accelerate computations, transferring data between CPU and GPU memory incurs overhead. For optimal performance:
DataLoaders
help with this.When saving and loading models (covered in a later chapter), torch.load()
provides a map_location
argument. This argument is very useful for loading models onto a device different from where they were saved (e.g., loading a GPU-trained model onto a CPU-only machine).
Here's a quick comparison:
Feature | TensorFlow (Common Practice) | PyTorch |
---|---|---|
Device Specification | Strings (e.g., '/CPU:0' , '/GPU:0' ) |
torch.device object (e.g., torch.device('cuda') ) |
Tensor Placement | with tf.device(...) , tf.identity(t, dev) |
tensor.to(device) , tensor.cuda() , tensor.cpu() |
Model Placement | Often implicit with Keras, or via tf.distribute.Strategy |
model.to(device) |
Default Tensor Device | CPU (unless within a GPU device scope or GPU is default) | CPU |
Check GPU Availability | len(tf.config.list_physical_devices('GPU')) > 0 |
torch.cuda.is_available() |
Moving to NumPy | tensor.numpy() (device agnostic if eager execution) |
tensor.cpu().numpy() (if tensor is on GPU) |
PyTorch's approach to device management gives you direct and unambiguous control. While this means you need to explicitly manage data movement, it also simplifies debugging device-related issues and provides clarity on where your computations are taking place. This explicit control forms a solid foundation for more advanced scenarios, such as distributed training across multiple GPUs or machines.
Was this section helpful?
© 2025 ApX Machine Learning