Efficient data handling is fundamental to training machine learning models, particularly when dealing with large datasets where I/O can become a significant bottleneck. TensorFlow's tf.data
API offers a powerful and flexible way to build performant input pipelines that decouple data extraction and transformation from model training.
In this chapter, you will learn to use the tf.data
API effectively. We will cover:
tf.data.Dataset
objects from various sources, including in-memory arrays (NumPy, Tensors), Python generators, and optimized file formats like TFRecord.map()
for element-wise preprocessing, batch()
for grouping data, shuffle()
for randomization, and prefetch()
for performance optimization.tf.data
pipelines seamlessly with the Keras model.fit()
API for training and evaluation.Upon completing this chapter, you will be able to construct scalable and efficient data loading mechanisms for your TensorFlow models.
5.1 Why tf.data?
5.2 Creating Datasets from Tensors, NumPy, and Generators
5.3 Working with TFRecord Files
5.4 Applying Transformations: map()
5.5 Batching and Shuffling
5.6 Prefetching for Performance
5.7 Integrating tf.data with model.fit()
5.8 Image Data Augmentation with tf.data
5.9 Hands-on Practical: Building an Image Data Pipeline
© 2025 ApX Machine Learning