Apache Parquet Format, Apache Software Foundation, 2025 (Apache Software Foundation) - Describes the official specification for the Apache Parquet columnar file format, detailing its structure and the integration of compression codecs for splittability and parallel processing.
Zstandard: Fast Real-time Compression Algorithm, Yann Collet, Pascal Massimino, Jean-Marc Odobez, 2016USENIX Annual Technical Conference (ATC) (USENIX Association)DOI: 10.5555/3008035.3008044 - Presents the Zstandard compression algorithm, detailing its design for achieving high compression ratios with fast decompression speeds, a significant improvement over prior codecs.
Snappy: A Fast Compressor/Decompressor, Google, 2024 - The official project repository for Google Snappy, describing its design principles for high compression and decompression speed with practical compression ratios, suitable for interactive data processing.
Designing Data-Intensive Applications, Martin Kleppmann, 2017 (O'Reilly Media) - A comprehensive book on the fundamental principles of designing scalable data systems, covering data storage, processing, and distributed architectures pertinent to data lake optimization.