Optimizing Data Loading and Preprocessing Pipelines
Was this section helpful?
Build TensorFlow input pipelines, TensorFlow Developers, 2024 - Official guide for the tf.data API, covering parallelization, prefetching, caching, and efficient data formats like TFRecord.
torch.utils.data.DataLoader, PyTorch Developers, 2025 - Official documentation explaining the PyTorch DataLoader for parallel data loading with num_workers and pin_memory options.
Designing Machine Learning Systems: New Ways of Working with AI, Daniel Sarfati, Noah Shamma, and Shreyas Rade, 2022 (O'Reilly Media) - A book covering the design and optimization of machine learning systems, including data pipelines, infrastructure, and performance considerations.