requires_grad)backward()).grad)torch.nntorch.nn.Module Base Classtorch.nn losses)torch.optim)torch.utils.data.Datasettorchvision.transforms)torch.utils.data.DataLoadertorchvision.transforms)Raw data, such as images or text, rarely comes in a format perfectly suited for direct input into a neural network. Models often expect numerical tensors of a specific size and distribution. Furthermore, to improve model generalization and prevent overfitting, it's common practice to artificially expand the training dataset by applying random modifications to the existing data. This is where data transformations come into play.
PyTorch, particularly through the torchvision library for computer vision tasks, provides a convenient module, torchvision.transforms, containing a variety of common operations that can be chained together to create a data processing pipeline. These transformations serve two primary purposes:
Let's look at some essential transforms.
These transforms are typically applied to all dataset splits (training, validation, and test) to ensure consistency.
transforms.ToTensor(): This is often one of the first transforms applied to image data loaded using libraries like PIL (Python Imaging Library) or NumPy. It converts a PIL Image or NumPy array (in Height x Width x Channel format) into a PyTorch FloatTensor (in Channel x Height x Width format). Importantly, it also scales the pixel values from the range [0, 255] to [0.0, 1.0]. This conversion to tensors and standardized range is necessary for model input.
transforms.Resize(size): Resizes an input image to a given size. If size is an integer, the smaller edge of the image will be matched to this number, maintaining the aspect ratio. If size is a sequence like (h, w), it resizes the image to the exact height h and width w. This is important because many neural networks require fixed-size inputs.
transforms.CenterCrop(size): Crops the central part of an image to the given size. This is often used after resizing to ensure the final image dimensions are exact, focusing on the central region.
transforms.Normalize(mean, std): Normalizes a tensor image using the provided mean and standard deviation for each channel. The operation applied is:
output=(input−mean)/std
Normalization helps stabilize training and can lead to faster convergence by ensuring input features have a similar scale, often centered around zero. The mean and std are typically sequences of values, one for each input channel (e.g., 3 values for RGB images). Pre-computed values from large datasets like ImageNet are often used as defaults: mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225].
These transforms introduce randomness and are typically applied only to the training dataset. This helps the model learn to be invariant to minor changes in the input, making it less prone to overfitting.
transforms.RandomHorizontalFlip(p=0.5): Horizontally flips the image randomly with a given probability p (default is 0.5, meaning 50% chance).transforms.RandomRotation(degrees): Rotates the image by a random angle selected uniformly from (-degrees, +degrees) or within a specific range if degrees is a sequence (min, max).transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0): Randomly changes the brightness, contrast, saturation, and hue of an image. You can specify the range of jitter for each attribute. For example, brightness=0.2 means randomly picking a brightness factor between [max(0, 1 - 0.2), 1 + 0.2].transforms.RandomResizedCrop(size): Crops a random portion of the image and resizes it to the desired size. This is a very common augmentation technique, especially for training image classification models like Inception networks.You rarely apply just one transformation. PyTorch makes it easy to chain multiple transformations together using transforms.Compose. It takes a list of transform objects and applies them sequentially.
Here's an example of creating a processing pipeline for training data, including resizing, augmentation, conversion to a tensor, and normalization:
import torchvision.transforms as transforms
# Example transform pipeline for training
train_transform = transforms.Compose([
transforms.Resize(256), # Resize smaller edge to 256
transforms.RandomCrop(224), # Randomly crop a 224x224 patch
transforms.RandomHorizontalFlip(), # Randomly flip horizontally
transforms.ToTensor(), # Convert PIL Image to tensor (0-1 range)
transforms.Normalize(mean=[0.485, 0.456, 0.406], # Normalize with ImageNet stats
std=[0.229, 0.224, 0.225])
])
# Example transform pipeline for validation/testing (no augmentation)
test_transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224), # Center crop to 224x224
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
print("Training Transforms:")
print(train_transform)
print("\nTesting Transforms:")
print(test_transform)
As mentioned in the previous section on Dataset objects, these composed transforms are typically passed as an argument (often named transform or target_transform) during the instantiation of a Dataset. For built-in datasets like those in torchvision.datasets, this is straightforward:
# Assuming you have torchvision installed
from torchvision.datasets import ImageFolder
from pathlib import Path
# Example usage with torchvision's ImageFolder
train_data_path = Path("path/to/your/train_images")
test_data_path = Path("path/to/your/test_images")
train_dataset = ImageFolder(root=train_data_path, transform=train_transform)
test_dataset = ImageFolder(root=test_data_path, transform=test_transform)
# When you access an item from train_dataset, train_transform is applied
sample_image, sample_label = train_dataset[0] # sample_image is now a transformed tensor
For custom Dataset classes, you would typically accept the transform object in the __init__ method and apply it within the __getitem__ method before returning the sample.
from torch.utils.data import Dataset
from PIL import Image
class CustomImageDataset(Dataset):
def __init__(self, image_paths, labels, transform=None):
self.image_paths = image_paths
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.image_paths)
def __getitem__(self, idx):
image_path = self.image_paths[idx]
label = self.labels[idx]
image = Image.open(image_path).convert("RGB") # Load image
if self.transform:
image = self.transform(image) # Apply transformations
return image, label
# Usage
custom_train_dataset = CustomImageDataset(train_paths, train_labels, transform=train_transform)
custom_test_dataset = CustomImageDataset(test_paths, test_labels, transform=test_transform)
By defining appropriate transformations and integrating them into your Dataset, you ensure that the data fed into your model is properly formatted and, for training data, sufficiently augmented. This sets the stage for the next step: efficiently loading this processed data in batches using DataLoader.
Was this section helpful?
torchvision, including usage examples and parameters.© 2026 ApX Machine LearningEngineered with