Convolutional Neural Networks excel at processing grid-like data, and images are a primary example. However, raw image files aren't directly usable by a neural network. They need to be loaded, decoded, potentially resized to a uniform dimension, and normalized before being fed into the input layer of your CNN. Keras provides convenient tools to streamline this entire process, especially when dealing with large datasets.
A common way to organize image datasets for classification tasks is to have one subdirectory for each class within a main data directory. For example:
data/
├── train/
│ ├── cats/
│ │ ├── cat_001.jpg
│ │ ├── cat_002.jpg
│ │ └── ...
│ └── dogs/
│ ├── dog_001.jpg
│ ├── dog_002.jpg
│ └── ...
└── validation/
├── cats/
│ ├── cat_101.jpg
│ └── ...
└── dogs/
├── dog_101.jpg
└── ...
Keras offers the image_dataset_from_directory
utility, which is highly effective for this structure. It automatically infers class labels from the directory names and generates batches of images and corresponding labels.
import keras
import tensorflow as tf # Often needed for dataset operations even with other backends
# Define image dimensions and batch size
img_height = 180
img_width = 180
batch_size = 32
# Load training data
train_ds = keras.utils.image_dataset_from_directory(
"data/train",
labels='inferred', # Infer labels from directory structure
label_mode='int', # Use integer labels (e.g., 0 for cats, 1 for dogs)
# Or 'categorical' for one-hot encoded labels, 'binary' for 2 classes
image_size=(img_height, img_width), # Resize images during loading
interpolation='nearest', # Method for resizing
batch_size=batch_size,
shuffle=True # Shuffle the data
)
# Load validation data (usually no shuffling needed)
val_ds = keras.utils.image_dataset_from_directory(
"data/validation",
labels='inferred',
label_mode='int',
image_size=(img_height, img_width),
interpolation='nearest',
batch_size=batch_size,
shuffle=False
)
# Print class names found
print("Class names:", train_ds.class_names)
This function returns a tf.data.Dataset
object (even when using backends like PyTorch or JAX via Keras 3, TensorFlow's tf.data
API is often used for efficient data loading pipelines). This object yields batches of (images, labels)
, where images
is a tensor of shape (batch_size, img_height, img_width, channels)
and labels
is a tensor of shape (batch_size,)
. The number of channels
is typically 3 for RGB images or 1 for grayscale.
Raw pixel values (usually integers from 0 to 255) are often not ideal for neural network training. Two common preprocessing steps are essential:
image_dataset_from_directory
can resize images during loading, you might load data differently or want resizing as part of your model definition. The keras.layers.Resizing
layer can be added to your model.keras.layers.Rescaling
layer is perfect for this.You can integrate these preprocessing steps directly into your model using Keras layers. This ensures that the preprocessing is applied consistently during training, evaluation, and inference, and it can even happen on the GPU for better performance.
import keras
from keras import layers
# Define input shape (height, width, channels)
input_shape = (img_height, img_width, 3) # Assuming RGB images
# Create a model incorporating preprocessing
preprocessing_model = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Rescaling(1./255), # Scale pixel values from [0, 255] to [0, 1]
# Alternatively, for [-1, 1] scaling: layers.Rescaling(1./127.5, offset=-1)
]
)
# You can then add your CNN layers after this preprocessing block
# Example:
# model = keras.Sequential([
# preprocessing_model,
# layers.Conv2D(...),
# # ... more CNN layers ...
# ])
Deep learning models, especially CNNs trained on images, benefit significantly from larger datasets. When collecting more data is expensive or impractical, data augmentation offers a powerful alternative. It involves applying random transformations to your existing training images to generate slightly modified, yet plausible, new training examples. This helps the model become more invariant to variations like changes in position, orientation, brightness, or zoom, leading to better generalization and reduced overfitting.
Keras provides a suite of preprocessing layers specifically designed for data augmentation. These layers apply transformations randomly during training but are inactive during evaluation or inference. Common augmentation layers include:
layers.RandomFlip("horizontal")
: Randomly flips images horizontally.layers.RandomRotation(factor=0.1)
: Randomly rotates images by a fraction of 2π (e.g., 0.1 means +/- 10% of 360 degrees).layers.RandomZoom(height_factor=0.2)
: Randomly zooms images in or out vertically and horizontally.layers.RandomContrast(factor=0.2)
: Randomly adjusts image contrast.layers.RandomBrightness(factor=0.2)
: Randomly adjusts image brightness.These augmentation layers are typically placed after resizing and rescaling but before the main convolutional layers in your model definition.
import keras
from keras import layers
# Build a data augmentation pipeline
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
layers.RandomContrast(0.1),
# Add more augmentation layers as needed
],
name="data_augmentation",
)
# Integrate into a full model definition
model = keras.Sequential([
keras.Input(shape=input_shape),
# Apply rescaling first
layers.Rescaling(1./255),
# Apply augmentation (only active during training)
data_augmentation,
# Now add the CNN base
layers.Conv2D(filters=32, kernel_size=3, activation="relu"),
layers.MaxPooling2D(pool_size=2),
layers.Conv2D(filters=64, kernel_size=3, activation="relu"),
layers.MaxPooling2D(pool_size=2),
# ... Flatten and Dense layers for classification ...
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5), # Dropout is another form of regularization
layers.Dense(1, activation='sigmoid') # Example for binary classification
])
# Compile the model
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Now you can train this model using train_ds and val_ds
# model.fit(train_ds, epochs=..., validation_data=val_ds)
An example flow for preparing image data for a Keras CNN model, including loading, rescaling, and augmentation.
By leveraging Keras utilities like image_dataset_from_directory
and the preprocessing/augmentation layers, you can build efficient and robust input pipelines for your CNNs. Remember that careful preprocessing and appropriate data augmentation are often significant factors in achieving high performance on image-based tasks.
© 2025 ApX Machine Learning