Distributed training questions

Hello again,

two more questions:

  1. When I am using the parameter sever model (based on the official mxnet tutorial - my full code is here) and I print the test loss/accuracy etc, these are different in different machines:
Epoch 0: Test_mcc 0.386387: test_Tnmt: 0.671969
Epoch 0: Test_mcc 0.370691: test_Tnmt: 0.693026

Question: is this happening because each worker has a different set of initial weights? Having followed the horovod distributed training tutorial (haven’t managed to make it work yet), they explicitely mention to broadcast parameters to all workers. Do we need to manually broadcast all parameters to all servers before training (or give the same seed to all machines?) or is this taken care of for us?

  1. In the same tutorial with horovod, they mention that using a ratio of server/worker ~ 2 for the parameter server model training gives better scaling performance. Is this universal (applies to most problems/training?).

Thank you for your time,
Foivos