With the rise of massive networks there is a push to find ways to parallelism training in new ways to make research on them more accessible. The sheer size of these networks which now reach into the billions of parameters cut off many researchers from making advances in the field. There are various attempts to fix this issue and there are a lot of different proposed techniques that make it possible to exploit parallel architectures. This paper attempts to replicate the use of many servers that would normally be considered too unstable or limited to otherwise be used for the purpose of large model training.


The main thrust of this paper is combining several known techniques for large model training with a peer to peer style network design that maintains redundancy by allowing previous peers to reroute in the event of the disconnection of a peer. This seems to have allowed them to be equally effective as other techniques while utilizing nodes that are normally considered unsuitable. In many ways this would still be an extremely expensive way to train a large model and therefore doesn’t necessarily open up the field to everyone but it does give another option and that is a welcome change. The use of spot instances, that is instances that could be stopped by a cloud provider would be much cheaper than the traditional usage of dedicated high performance instances.

I think a really strong point of this paper is its supplementary material which answers a lot of questions and goes into some further detail in a manner that is very encouraging.