Apache Iceberg: An Open Table Format for Large-Scale Analytic Datasets, Ryan Blue, Daniel Weeks, Jeremy Klop, and Jason Zhang, 2018Proceedings of the VLDB Endowment, Vol. 11 (VLDB Endowment Inc.)DOI: 10.14778/3275727.3275730 - Describes the design and benefits of Apache Iceberg, including its approach to hidden partitioning and schema evolution, which addresses limitations of traditional Hive-style partitioning.
Apache Hive Documentation - Partitioned Tables, The Apache Hive Community, 2023 (Apache Software Foundation) - Explains the mechanics and benefits of traditional Hive-style partitioning, which forms the basis of directory-based indexing in data lakes.
Optimizing Amazon S3 performance for data lakes, Amazon Web Services, 2019 (AWS) - Discusses practical strategies for optimizing data lake performance on Amazon S3, including the importance of partitioning, file sizing, and other architectural considerations.