MERGE INTO, Delta Lake Project, 2024 - Provides detailed syntax, semantics, and examples for performing UPSERT/MERGE operations on Delta tables, essential for applying Change Data Capture (CDC) events.
Delta Lake: High-Performance ACID Table Storage for Spark and Beyond, Michael Armbrust, Sameer Agarwal, Xiangrui Meng, Timothy Hunter, Joseph K. Bradley, Ali Ghodsi, John F. D. Moak, Michael J. Franklin, and David F. Patterson, 2020Proceedings of the ACM on Management of Data (SIGMOD '20), Vol. 4 (ACM)DOI: 10.1145/3381661 - Academic paper introducing Delta Lake's design, ACID properties, and how it enables reliable data lake operations, including UPSERTs.
Structured Streaming Programming Guide, Apache Spark Project, 2024 - Details how to build scalable, fault-tolerant streaming applications using Spark, which is the engine for the 'apply' phase of the CDC pipeline.