Training on gpu(1) and gpu(2) allocates some memory on gpu(0)


#1

I am trying to train my model on gpu(1) and gpu(2), and have ensured that all the ndarrays and symbols are on the correct GPU.
However,
-> when i call trainer.step() while watching nvidia-smi, gpu(0)'s memory gets filled with some data
-> all future calls to trainer.step() do not have any effect on gpu(0)

Debugging this in “mxnet/gluon/trainer.py”, i found that this happens in _init_kvstore(self), where the line:
kvstore.init(i, param_arrays[0]) allocates space on gpu(0), even through “param_arrays[0]” is on gpu(1)

Minimum reporducible experiment:

import mxnet as mx
kv=mx.kvstore.create(name=‘device’)
ones=mx.nd.ones((1,1), ctx=mx.gpu(1))
kv.init(0, ones) ====> This line will allocate memory on gpu(0)


#2

I can confirm this odd behaviour. Would you mind creating a github issue to report it? Looks like a bug to me. Thanks!


#3

someone is already looking into it


#4

Fix: https://github.com/apache/incubator-mxnet/pull/11146