TFX: A TensorFlow-Based Production Machine Learning Platform, Clemens Mewald, Robert Crowe, Noah Fiedel, Konstantin Goldeli, Joshua T. Gordon, Denis Khayda, D. Sculley, Ben Vanik, and Jeremy Zurcher, 2019KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Association for Computing Machinery (ACM))DOI: 10.1145/3292500.3330663 - Describes Google's TFX platform, which established principles for production ML, including feature management and the need for data and model versioning to ensure reproducibility and reliability in large-scale systems.
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores, Matei Zaharia, Tathagata Das, Rahul Goswami, Joseph Bradley, Xiangrui Meng, Liwen Sun, Burak Yavuz, Arsalan Tavakoli, Ali Ghodsi, and Michael Armbrust, 2020Proceedings of the VLDB Endowment, Vol. 13 (VLDB Endowment)DOI: 10.14778/3415494.3415512 - This paper introduces Delta Lake, a technology that enables time-travel queries and ACID transactions on data lakes, which is fundamental to implementing time-based feature versioning strategies.
Semantic Versioning 2.0.0, Tom Preston-Werner, 2013 - The official specification for semantic versioning, a widely adopted standard for software that can be adapted for versioning feature definitions and transformation code.