What will happen when I build two dist-* kvstore?


#1

I wrote a python program:

# file : test.py
import mxnet as mx
kv0 = mx.kvstore.create('dist-sync')
kv1 = mx.kvstore.create('dist-sync')
print('something')

And i use launch.py to start distributed training:

${MXNET_PATH}/tools/launch.py -n 3 -H hosts --launcher ssh python test.py

The program stuck, I don’t know why.


#2

Hi @ZhouJ

What’s your requirement for 2 kvstores? You should be able to just use a single distributed kvstore. With a single kvstore, does the launch.py script work okay?


#3

Thank you.

  1. launch.py works okay with a single kvstore.
  2. I built mx.module.Module with an updater, which will register an updater for kvstore.
    But I want to transmit some data between workers without any needless calculation. So I want another kvstore without updater.

#4

To solve my problem, I add a function named push_no_update in kvstore, and it works okay.
But I still don’t know why the program will get stuck when build 2 kvstores.