Apache Iceberg: A Table Format for Analytic Datasets, Ryan Blue, Daniel Weeks, Parosh Jasani, Justin S. Johnson, Andrew Gallagher, Jason D. Reid, Brock Noland, Michael Yoder, Steven Hamm, 2020Proceedings of the VLDB Endowment (PVLDB), Vol. 13 (VLDB Endowment)DOI: 10.14778/3407914.3407920 - This paper introduces Apache Iceberg, detailing its architecture and how it addresses challenges like schema evolution and hidden partitioning in large-scale data lakes.
Delta Lake: High-Performance ACID Table Storage for Big Data, Michael Armbrust, Ali Ghodsi, Andrew Liu, Xiangrui Meng, Joseph Bradley, Burak Yavuz, Jeffrey H. Reback, Xiang Zhang, Evan Zhou, 2020Proceedings of the VLDB Endowment (PVLDB), Vol. 13 (VLDB Endowment)DOI: 10.14778/3400735.3400742 - Presents Delta Lake, an open-source storage layer that brings ACID transactions and schema enforcement to data lakes, aiding schema evolution.
Structured Streaming Programming Guide, The Apache Software Foundation, 2024 - The sections on managing schema evolution and handling incompatible changes in streaming data pipelines provide practical implementation details for Apache Spark.