Max-norm constraint / regularizer on different layer


#1

I would like to implement a max-norm regularization as in this paper http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

The idea is that I want to clip the value of the weight by l2 norm of the weights of the network after the optimizer updates the weights. This l2 norm is calculated differently for different layer (such as conv and fc)

Does anyone has any suggestion how to do this in mxnet? Preferably with simple modification so that I could use multi-gpu training API stuffs