TFX: A TensorFlow-Based Production Machine Learning Platform, Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu-Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, Martin A. Zinkevich, 2017Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM (Association for Computing Machinery))DOI: 10.1145/3097983.3098021 - Presents TensorFlow Extended (TFX), detailing its components like TensorFlow Data Validation (TFDV) which provides tools for detecting data anomalies and training-serving skew through statistical analysis.
Machine Learning Design Patterns, Valliappa Lakshmanan, Sara Robinson, Michael Munn, 2020 (O'Reilly Media) - This book offers practical design patterns for building reliable machine learning systems, including strategies for consistent feature engineering, data validation, and mitigating training-serving skew.
Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison, 2015Advances in Neural Information Processing Systems, Vol. 28 (NeurIPS Proceedings) - A widely cited paper that identifies various forms of technical debt specific to machine learning systems, many of which can lead to online/offline skew, such as data dependencies and model update cycles.