Introduction to Data Lake Architectures
章节 1: Architectural Foundations
Decoupling Compute and Storage
The Medallion Architecture
Lambda and Kappa Architectures
章节 2: File Formats and Optimization
Row-Oriented vs Columnar Storage
Hands-on Practical: Optimizing File Layouts
章节 3: Ingestion Pipelines
Batch Ingestion Workflows
Change Data Capture (CDC)
Handling Schema Evolution
Hands-on Practical: Building a CDC Pipeline
章节 4: Metadata and Cataloging
The Role of the Metastore
Data Lineage Implementation
Hands-on Practical: Configuring a Catalog
章节 5: Querying and Performance
Distributed Query Engines
File Pruning and Skipping
Vectorized Query Execution
Hands-on Practical: Query Analysis