Introduction to Data Lake Architectures
Chapter 1: Architectural Foundations
Decoupling Compute and Storage
The Medallion Architecture
Lambda and Kappa Architectures
Chapter 2: File Formats and Optimization
Row-Oriented vs Columnar Storage
Hands-on Practical: Optimizing File Layouts
Chapter 3: Ingestion Pipelines
Batch Ingestion Workflows
Change Data Capture (CDC)
Handling Schema Evolution
Hands-on Practical: Building a CDC Pipeline
Chapter 4: Metadata and Cataloging
The Role of the Metastore
Data Lineage Implementation
Hands-on Practical: Configuring a Catalog
Chapter 5: Querying and Performance
Distributed Query Engines
File Pruning and Skipping
Vectorized Query Execution
Hands-on Practical: Query Analysis