Fault Tolerance, The Apache Flink Community, 2025 (Apache Flink Project) - Official documentation for Flink's fault tolerance mechanisms, including restart strategies, checkpointing, and high availability configuration.
Apache Flink: Stream Processing at Scale, Alexander Alexandrov, Rico Bergmann, Paris Carbone, Maximilian E. Schwalbe, Marcin Staniewski, Kostas Tzoumas, Stephan Ewen, Robert Hirschfeld, Johannes Kirschnick, Johann-Christoph Freytag, Felix Naumann, 2017Proceedings of the VLDB Endowment, Vol. 10 (VLDB Endowment)DOI: 10.14778/3137628.3137629 - Foundational paper detailing the design and implementation of Apache Flink, focusing on its distributed architecture and fault tolerance model through lightweight asynchronous snapshots.