I am trying to train my model on gpu(1) and gpu(2), and have ensured that all the ndarrays and symbols are on the correct GPU.
-> when i call trainer.step() while watching nvidia-smi, gpu(0)'s memory gets filled with some data
-> all future calls to trainer.step() do not have any effect on gpu(0)
Debugging this in “mxnet/gluon/trainer.py”, i found that this happens in _init_kvstore(self), where the line:
kvstore.init(i, param_arrays) allocates space on gpu(0), even through “param_arrays” is on gpu(1)
Minimum reporducible experiment:
import mxnet as mx
kv.init(0, ones) ====> This line will allocate memory on gpu(0)