I am going to be honest in the introduction here that when selecting this paper I had little exposure to federated learning and wanted to learn more. I lucked out with this paper as it is a great introduction to the concept of federated learning. The main concept here is that multiple groups each have their own dataset that they do not wish to share yet they want a model that has been trained on the dataset of each group. Federated training allows for the parameter updates of training on each group’s dataset to be combined to create a model that approximates if the datasets of all had have been combined into a meta dataset.
One difficulty of this approach is that each group may be training on with a different cluster of GPUs and therefore may differ in computational power. This can lead to situations where the updates can only occur at the speed of the slowest cluster. Different network capabilities can also lead to over saturation of networking creating an additional bottleneck in training. The paper not only gives quite a good description of what they are trying to solve but also how they have a unique solution to make for more efficient federated training between clusters of different capabilities.
FEDML PARROT: A SCALABLE FEDERATED LEARNING SYSTEM VIA HETEROGENEITY-AWARE SCHEDULING ON SEQUENTIAL AND HIERARCHICAL TRAINING
Zhenheng Tang, Xiaowen Chu, Ryan Yide Ran, Sunwoo Lee, Shaohuai Shi, Yonggang Zhang, Yuxin Wang, Alex Qiaozhong Liang, Salman Avestimehr, Chaoyang He