Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Haoyuan Li, Scott Shenker, and Ion Stoica, 2012USENIX Conference on Networked Systems Design and Implementation (NSDI)DOI: 10.5555/2632222.2632227 - Describes the core Resilient Distributed Dataset (RDD) abstraction, which is fundamental to Apache Spark's distributed computation model and fault tolerance capabilities.
Apache Spark Documentation, The Apache Software Foundation, 2024 - Provides comprehensive technical details, usage guides, and optimization strategies for Apache Spark, covering its APIs, configuration, and performance tuning for large-scale data processing.
Apache Flink Documentation, The Apache Software Foundation, 2025 - Offers complete documentation for Apache Flink, including its batch processing capabilities, Table API & SQL, and deployment guides relevant for large-scale feature computation.
Michelangelo: Uber's Machine Learning Platform, Sibylle Lanz, Jeremy H. Hong, Kai Jiang, Mayur Rustagi, Gaurav Singh, Mike Wu, David Xiao, and Andrew P. Zang, 20172017 IEEE International Conference on Data Science and Advanced Analytics (DASFAA) (IEEE)DOI: 10.1109/DSAAE.2017.8286940 - Presents Uber's end-to-end machine learning platform, Michelangelo, detailing its architecture and components, including the feature store, which relies on scalable offline computation. The URL points to an accessible summary of the paper.