Regularizer for Custom Parameter

Hi,
I am trying to port some Tensorflow code to MxNet Gluon and would like your help. In Tensorflow when creating a trainable Variable (equivalent to Parameter in mxnet gluon), there is a way to regularize just that variable. See tensorflow variable API here see “regularizer” arg. e.g. I can choose to apply L2 norm just for that trainable Variable/Parameter.

What is the equivalent in MxNet Gluon?

I haven’t really done anything like that, but there is a wd_mult property on a Parameter object, which might be the one that you need.

I would try to set it to 0 for all Parameters, which shouldn’t participate in regularization, and see what would happen.

1 Like

@Sergey thank you for the reply. Basically this is what I have in tensorflow

Create a custom variable

customParam = tf.get_variable('customParam', shape=(6,), dtype=tf.float32,
        initializer=tf.random_uniform_initializer(minval=-2, maxval=2, seed=666)) 

with tf.name_scope('customParam/Regularizer'):  # following slim's naming convention
    reg = tf.to_float(0.1)
    customParam_reg = tf.identity(reg * tf.nn.l2_loss(customParam), name='l2_regularizer')
    tf.losses.add_loss(customParam_reg, tf.GraphKeys.REGULARIZATION_LOSSES)

This is what I did in MxNet without the regularization:

class CustomModel(HybridBlock):
    self.customParam = self.params.get('customParam', grad_req='write', shape=(6,), dtype='float32',)

What is the way to specify wd_mult to induce the same behavior that is in tensorflow?

You need to setup wd_mult once you define a Parameter. Here is the full, deterministic example (with seed as in your example). Notice, I had to setup global wd as 1, because wd_mult works as a multiplier - if wd is zero, then multiplied version would also be zero.

import numpy as np
import mxnet as mx
from mxnet import gluon, autograd
from mxnet.gluon import Trainer
from mxnet.gluon.loss import L2Loss

# fix the seed
np.random.seed(666)
mx.random.seed(666)


class RegularizedFullyConnected(gluon.HybridBlock):
    def __init__(self, hidden_units, wd_mult=1.0):
        super(RegularizedFullyConnected, self).__init__()

        with self.name_scope():
            self.weights = self.params.get('weights',
                                           wd_mult=wd_mult,
                                           shape=(hidden_units, 0),
                                           allow_deferred_init=True)

    def hybrid_forward(self, F, x, weights):
        weighted_data = F.FullyConnected(x,
                                         weights,
                                         num_hidden=self.weights.shape[0],
                                         no_bias=True)
        return weighted_data


hidden_units = 6
net = RegularizedFullyConnected(hidden_units, wd_mult=0.1)
net.initialize(mx.init.Uniform())

x = mx.random.normal(shape=(1, 2))
label = mx.random.normal(shape=(hidden_units,))

l2_loss = L2Loss()
trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1, 'wd': 1})

with autograd.record():
    out = net(x)
    loss = l2_loss(out, label)

loss.backward()
trainer.step(1)
print(net.weights.data())

If you change wd_mult to something else, you will see that print output is different

1 Like