Trino Documentation, The Trino Community, 2024 - Official documentation for the Trino distributed SQL query engine, detailing its architecture, query lifecycle, and optimization techniques.
Apache Spark SQL, DataFrames and Datasets Guide, The Apache Software Foundation, 2024 - Official guide to Apache Spark SQL, covering its distributed execution model, query optimization with Catalyst, and mechanisms for handling memory and disk spills.
Query Processing in the Age of Data Lakes, Marcin Zukowski, Matt Fuller, Martin Traverso, David Philippi, 2019Proceedings of the VLDB Endowment (PVLDB), Vol. 12 (VLDB Endowment)DOI: 10.14778/3352063.3352079 - This paper discusses the challenges and design considerations for distributed SQL query engines operating on data lakes, including object storage and decoupling compute from storage.
Presto: The Definitive Guide: A Guide to the SQL Query Engine for All Your Data, Matt Fuller, Manfred Moser, Martin Traverso, 2020 (O'Reilly Media) - A comprehensive guide covering the architecture, implementation, and usage of Presto (now Trino), focusing on its massively parallel processing capabilities and data lake integration.