Multiple GPUs RNNs grad clip

ShootingSpace · June 10, 2018, 3:07pm

Hey guys, I’m wondering if this is the right way to clip grad over multiple GPUs? (using data parallelism)

grads = [p.grad(ctx) for ctx in ctxs for p in model.collect_params().values()]
gluon.utils.clip_global_norm(grads, args.clipping_theta * seq_len * batch_size)

thomelane · June 12, 2018, 12:15am

Hi @ShootingSpace,

You should be able to define gradient clipping in the Optimizer given to the Trainer object. Check out clip_gradient argument of Optimizer. And then Trainer takes optimizer_params as follows:

mxnet.gluon.Trainer(net.collect_params(), optimizer='sgd',
                    optimizer_params={'learning_rate': 0.1, 'clip_gradient':5},
                    kvstore='device') #for GPU

All should be fine across multiple GPUs, and clipping should occur when the magnitude of the gradient exceeds 5 in this example.

Topic		Replies	Views
Clip_gradient setting in gluon.trainer doesn't work Gluon	1	726	December 3, 2018
Lower accuracy on Cifar10 with multi-gpu implementation	5	602	August 23, 2018
Using multiple gluon trainers with kvstore Gluon	3	431	July 3, 2020
Gradient compression default? Gluon	4	560	December 7, 2018
How to do multi-gpu training on public SageMaker gluon example? Gluon	2	765	November 14, 2018

Multiple GPUs RNNs grad clip

Related Topics