Max-norm constraint / regularizer on different layer

I would like to implement a max-norm regularization as in this paper http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

The idea is that I want to clip the value of the weight by l2 norm of the weights of the network after the optimizer updates the weights. This l2 norm is calculated differently for different layer (such as conv and fc)

Does anyone has any suggestion how to do this in mxnet? Preferably with simple modification so that I could use multi-gpu training API stuffs

have you solved the problem?

You can use mx.gluon.utils.clip_global_norm for this.

Usual code below for training a network (forward, backward and update steps).

import mxnet as mx


# define network
net = mx.gluon.nn.HybridSequential()
net.add(mx.gluon.nn.Dense(units=5))
net.add(mx.gluon.nn.Dense(units=4))
net.add(mx.gluon.nn.Dense(units=3))
# initialize and setup trainer
net.initialize()
optimizer = mx.optimizer.SGD(learning_rate=1)
trainer = mx.gluon.Trainer(net.collect_params(), optimizer=optimizer)
# forward + backward pass, and update weights
data = mx.nd.random.uniform(shape=(10,6))
with mx.autograd.record():
    output = net(data)
output.backward()
trainer.step(data.shape[0])
# show an example weight
print(net[0].weight.data())
[[-0.06072712  0.05941519  0.03317399 -0.05293032  0.05430559  0.01910102]
 [-0.03364548 -0.05314463 -0.01834892 -0.01944483 -0.0189993  -0.02578755]
 [ 0.03851217  0.02879925 -0.03378715  0.0451647  -0.04305394  0.00303232]
 [-0.01020094 -0.03154465 -0.02983426  0.05084194  0.04505462 -0.02282878]
 [-0.02955762  0.04034118  0.00601154  0.01511913  0.06757414 -0.03696253]]
<NDArray 5x6 @cpu(0)>

And then get the data from all of the network parameters, and clip to max_norm.

net_params = [p[1].data() for p in net.collect_params().items()]
max_norm = 0.1
mx.gluon.utils.clip_global_norm(net_params, max_norm=max_norm)
print(net[0].weight.data())
[[-0.00343347  0.00335929  0.00187563 -0.00299264  0.0030704   0.00107996]
 [-0.00190229 -0.00300476 -0.00103743 -0.0010994  -0.00107421 -0.00145801]
 [ 0.00217745  0.00162829 -0.0019103   0.00255358 -0.00243424  0.00017145]
 [-0.00057675 -0.00178351 -0.00168681  0.00287457  0.00254736 -0.00129072]
 [-0.00167117  0.00228086  0.00033989  0.00085482  0.00382059 -0.00208983]]
<NDArray 5x6 @cpu(0)>

And we see a change in the weights.