Multi GPU training - hidden state error

I am working on multi-GPU training using Gluon for RNNs

I have set the context as follows:
ctx = [mx.gpu(i) for i in range(num_gpus)]

And the begin state is defined as follows:
def begin_state(self, *args, **kwargs): return self.core.begin_state(*args, **kwargs)

But then the following code:
hidden = model.begin_state(func=mx.nd.zeros, batch_size=batch_size, ctx=ctx)

gives me the following error:
Invalid context string [gpu(0), gpu(1), gpu(2), gpu(3)]

I know that this is occurring because I am passing a list of contexts instead of one context. How to distribute the hidden state across all the GPUs?