Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Haoyuan Li, Scott Shenker, Ion Stoica, 20129th USENIX Symposium on Networked Systems Design and Implementation (NSDI '12) (USENIX Association)DOI: 10.5555/2393325.2393341 - This foundational paper introduces Resilient Distributed Datasets (RDDs), the core abstraction behind Apache Spark, which is vital for scalable and fault-tolerant distributed data processing.