The Lakehouse: A New Generation of Open Platforms for Data Management and ML, Armbrust, Michael and Ghodsi, Ali and Xin, Reynold S. and Zaharia, Matei, 202111th Biennial Conference on Innovative Data Systems Research (CIDR '21) - Introduces the Lakehouse architecture, which combines data lake flexibility with data warehouse features, highly relevant for building scalable offline feature stores and leveraging modern table formats.
Apache Parquet, The Apache Software Foundation, 2024 - The official documentation for Apache Parquet, a columnar storage format crucial for efficient storage, compression, and retrieval of large-scale feature data in offline stores.