Hi, I am trying to run distributed training the example provided on 2 different node with the following internal IP address:
user1@111.111.111.121
user2@111.111.111.122
I created a hosts file with the following ip, and ssh got no issue at all from one to other machine. When i launch the code:
python ../../tools/launch.py -n 2 --launcher ssh -H hosts python train_mnist.py --network lenet --kv-store dist_device_sync
And it prompt the following output at the same time:
user1@111.111.111.121's password: user1@111.111.111.121's password: user2@111.111.111.122's password: user2@111.111.111.122's password:
For both machine I’m using the same admin password, no matter how hard I try it just prom Permission denied, please try again.
It’s there any way I can get debug message on what really happening behind the background?