Hi,
I have a use-case where I want to train a model with dist SGD, using worker nodes which are in different locations (cities or countries), and which all have their share of the data. I want the nodes to share only gradients and parameters with other nodes, not raw data.
Is there any reason why this would not be possible to implement with MXNet’s default parameter server?