Multi GPU training - hidden state error

Neil_Daftary · May 29, 2020, 6:56pm

I am working on multi-GPU training using Gluon for RNNs

I have set the context as follows:
ctx = [mx.gpu(i) for i in range(num_gpus)]

And the begin state is defined as follows:
def begin_state(self, *args, **kwargs): return self.core.begin_state(*args, **kwargs)

But then the following code:
hidden = model.begin_state(func=mx.nd.zeros, batch_size=batch_size, ctx=ctx)

gives me the following error:
Invalid context string [gpu(0), gpu(1), gpu(2), gpu(3)]

I know that this is occurring because I am passing a list of contexts instead of one context. How to distribute the hidden state across all the GPUs?

Topic		Replies	Views
Error "Parameter was not initialized on context cpu(0)" Gluon	3	2199	July 24, 2019
How to do multi-gpu training on public SageMaker gluon example? Gluon	2	765	November 14, 2018
Lower accuracy on Cifar10 with multi-gpu implementation	5	601	August 23, 2018
Unable to run sample code on GPU Gluon	7	3602	June 20, 2019
Using pre-trained models: how to initialize the gluon Trainer? Gluon	0	258	February 15, 2023