All Courses

Image Data Augmentation with tf.data

When training image models, especially deep neural networks, having a large and diverse dataset is often significant for achieving good generalization and preventing overfitting. Overfitting occurs when a model learns the training data too well, including its noise and specific patterns, but fails to perform well on new, unseen data. Image data augmentation is a common technique to artificially expand the training dataset by creating modified versions of existing images.

Instead of applying these modifications manually and storing the augmented images, which can drastically increase storage requirements, we can perform augmentation on-the-fly as part of our tf.data input pipeline. This approach is memory-efficient and allows the augmentations to be performed potentially in parallel with model training, leveraging CPU or even GPU resources.

Using `tf.image` for Augmentation

TensorFlow provides the tf.image module, which contains a collection of functions specifically designed for image manipulation, including common augmentation techniques. These functions operate directly on tensors, making them integrate naturally into a tf.data pipeline using the Dataset.map() transformation.

Let's look at some frequently used augmentation functions within tf.image:

Geometric Transformations:
- tf.image.random_flip_left_right(image): Randomly flips an image horizontally (left to right).
- tf.image.random_flip_up_down(image): Randomly flips an image vertically (top to bottom). Note: This might not be suitable for all datasets (e.g., images where orientation matters, like digits).
- tf.image.rot90(image, k): Rotates an image by 90 degrees k times. You can use tf.random.uniform to choose a random k. More complex rotations often require external libraries or custom implementations.
Color and Contrast Adjustments:
- tf.image.random_brightness(image, max_delta): Adjusts the brightness of an image by a random factor chosen uniformly from [-max_delta, max_delta]. max_delta should be non-negative.
- tf.image.random_contrast(image, lower, upper): Adjusts the contrast of an image by a random factor chosen uniformly from [lower, upper].
- tf.image.random_saturation(image, lower, upper): Adjusts the saturation of an RGB image by a random factor chosen uniformly from [lower, upper].
- tf.image.random_hue(image, max_delta): Adjusts the hue of an RGB image by a random factor chosen uniformly from [-max_delta, max_delta]. max_delta should be in the interval [0, 0.5].

These operations expect input image tensors, typically with shape [height, width, channels] or [batch, height, width, channels]. When used within Dataset.map(), they are applied individually to each image tensor flowing through the pipeline.

Integrating Augmentation into the Pipeline

The standard practice is to apply augmentations only to the training dataset. The validation and test datasets should remain unchanged to provide a consistent measure of the model's performance on unseen, original data.

We can achieve this by defining a function that applies augmentations and mapping it only onto the training dataset. It's generally best to apply augmentations after decoding and resizing images but before batching.

Here's how you might structure a preprocessing function that includes random augmentation:

import tensorflow as tf

# Assume IMG_HEIGHT and IMG_WIDTH are defined
IMG_HEIGHT = 180
IMG_WIDTH = 180

def decode_and_resize(image_bytes):
  """Decodes JPEG, converts to float, and resizes."""
  img = tf.io.decode_jpeg(image_bytes, channels=3)
  img = tf.image.convert_image_dtype(img, tf.float32)
  img = tf.image.resize(img, [IMG_HEIGHT, IMG_WIDTH])
  return img

def augment_image(image, label):
  """Applies random augmentations to an image."""
  image = tf.image.random_flip_left_right(image)
  image = tf.image.random_brightness(image, max_delta=0.2)
  image = tf.image.random_contrast(image, lower=0.8, upper=1.2)
  # Add more tf.image augmentations as needed
  # Ensure image values remain in a valid range, e.g. [0, 1] for float images
  image = tf.clip_by_value(image, 0.0, 1.0)
  return image, label

# Example Usage within a tf.data pipeline:
# Assume 'train_files_ds' is a Dataset of image file paths
# Assume 'load_and_preprocess_image' reads file path, decodes, resizes
AUTOTUNE = tf.data.AUTOTUNE

def load_and_preprocess_image(path):
    image = tf.io.read_file(path)
    # For simplicity, assume label can be derived or is paired elsewhere
    # In reality, you'd load labels appropriately.
    label = 0 # Placeholder label
    return decode_and_resize(image), label

# Create a dataset of file paths (replace with your actual data source)
list_ds = tf.data.Dataset.list_files('path/to/images/*.jpg') # Example

# Create the training dataset
train_ds = list_ds.map(load_and_preprocess_image, num_parallel_calls=AUTOTUNE)

# Apply augmentation only to the training set
train_ds = train_ds.map(augment_image, num_parallel_calls=AUTOTUNE)

# Configure for performance: shuffle, batch, prefetch
train_ds = train_ds.shuffle(buffer_size=1000)
train_ds = train_ds.batch(32) # Example batch size
train_ds = train_ds.prefetch(buffer_size=AUTOTUNE)

# Create the validation dataset (NO augmentation)
# Assume 'val_files_ds' is the source for validation data
# val_ds = val_files_ds.map(load_and_preprocess_image, num_parallel_calls=AUTOTUNE)
# val_ds = val_ds.batch(32)
# val_ds = val_ds.prefetch(buffer_size=AUTOTUNE)

print("Training Dataset Spec:", train_ds.element_spec)
# Output might look like:
# (TensorSpec(shape=(None, 180, 180, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))

In this example:

We define separate functions for basic preprocessing (decode_and_resize) and augmentation (augment_image).
The load_and_preprocess_image function handles reading, decoding, and resizing.
The train_ds first maps load_and_preprocess_image to get processed images.
Crucially, train_ds then maps the augment_image function.
A val_ds would skip the augment_image map step.
Standard performance optimizations (shuffle, batch, prefetch) are applied afterwards. Using num_parallel_calls=tf.data.AUTOTUNE in the map functions allows tf.data to automatically tune the level of parallelism for data transformation, improving efficiency.

Considerations for Augmentation

Relevance: Choose augmentations that reflect realistic variations in your data domain. Flipping an image of a handwritten digit '6' might turn it into a '9', which is likely undesirable. However, flipping an image of a cat horizontally is usually acceptable. Adjusting brightness and contrast often mimics different lighting conditions.
Order: The order in which augmentations are applied can sometimes matter. For instance, rotating an image and then cropping it might yield a different result than cropping first and then rotating the cropped section. Experiment to see what works best.
Intensity: Be mindful of the intensity of augmentations (e.g., max_delta for brightness). Overly aggressive augmentation can sometimes distort images too much, making them unrepresentative and potentially harming training.
Performance: While tf.image operations are optimized, complex sequences of augmentations can still consume significant CPU resources. Ensure your data pipeline with augmentation can keep up with the GPU's training speed by using prefetch and monitoring your input pipeline performance, perhaps using the TensorFlow Profiler. For very computationally intensive augmentations, you might investigate libraries like KerasCV which offer GPU-accelerated augmentation layers that can be included directly in your model definition.

By integrating image data augmentation directly into your tf.data pipeline using tf.image functions, you can efficiently prepare and expand your training data, leading to more resilient models that generalize better to new data. Remember to apply these transformations thoughtfully, considering their relevance to your specific task and dataset.

Was this section helpful?

Image Data Augmentation with tf.data

Using tf.image for Augmentation

Integrating Augmentation into the Pipeline

Considerations for Augmentation

Using `tf.image` for Augmentation