Data Lake Architecture: How to Build and Manage a Scalable Data Lake, Ben Sharma, Michael Armbrust, and Xin Liu, 2021 (O'Reilly Media) - A comprehensive guide to designing and implementing data lakes, including detailed discussions on ingestion patterns, data layering (Bronze/Raw), and architectural considerations relevant to batch workflows.
Delta Lake Documentation, The Delta Lake Project Contributors, 2023 - Official documentation for Delta Lake, an open-source storage layer that brings ACID transactions to data lakes, addressing critical concerns such as consistency, isolation, and reliability for batch ingestion.
Designing Data-Intensive Applications, Martin Kleppmann, 2017 (O'Reilly Media) - A foundational book covering principles of data systems, including batch processing, data models, consistency, and distributed storage. It provides understanding relevant to data ingestion challenges like isolation and late-arriving data.
Tuning Spark, The Apache Spark Project Contributors, 2023 - The official Apache Spark documentation provides guidelines and best practices for optimizing Spark applications, including strategies for data partitioning and managing file sizes to improve query performance in data lakes.