Advances and Open Problems in Federated Learning, Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Hang Qi, Daniel Ramage, Ramesh Raskar, Mariana Raykova, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu and Sen Zhao, 2021Foundations and Trends® in Machine Learning, Vol. 14 (now publishers)DOI: 10.1561/2200000083 - This comprehensive survey discusses system heterogeneity, privacy constraints, and communication bottlenecks in federated learning, providing a broad context for monitoring and debugging.
FedMeter: An Efficient and Reliable Federated Learning Performance Meter, Wenhua Zhang, Ming Zhang, Kuan Chen, Kun Huang, Wei Liu, and Yue Zhang, 2022Proceedings of the 2022 International Conference on Data Mining Workshops (ICDMW) (IEEE)DOI: 10.1109/ICDMW58026.2022.00139 - This paper introduces FedMeter, a system designed for efficient and reliable performance monitoring in federated learning environments, addressing many of the monitoring challenges discussed.