Updating the parameters of HybridBlocks


Can someone please point me at an example that shows the computation of gradients using HybridBlocks and how the parameters of HybridBlocks can be updated using those gradients?

I did read the “…HybridBlocks” chaper of the “MXNet Straight Dope” book; but, that does not have any example code showing how to adjust the parameters of the hybrid blocks based on gradients.

Here is one specific question I currently have:

Can the weight and bias parameters in custom layers (HybridBlocks) be ND Arrays? If not, how do we mark that we are interested in getting the gradients for those parameters? As far as I could see “attach_grad” is only present for “ND Arrays”, but, not symbols.

I am currently creating the weight and bias parameters in my custom hybrid block as follows:

    self.W = mx.sym.random_uniform(low=-W_bound, high=W_bound, shape=(numHeights, numOutputs, numInputs))
    self.b = mx.sym.zeros((numOutputs,))

I need to get the gradient of the loss relative to W and b and then update W and b.



To define your own HybridBlock, say Resnet-18, you can do

from mxnet.gluon import nn
from mxnet import nd

class Residual(nn.HybridBlock):
    def __init__(self, channels, same_shape=True, **kwargs):
        super(Residual, self).__init__(**kwargs)
        self.same_shape = same_shape
        with self.name_scope():
            strides = 1 if same_shape else 2
            self.conv1 = nn.Conv2D(channels, kernel_size=3, padding=1,
            self.bn1 = nn.BatchNorm()
            self.conv2 = nn.Conv2D(channels, kernel_size=3, padding=1)
            self.bn2 = nn.BatchNorm()
            if not same_shape:
                self.conv3 = nn.Conv2D(channels, kernel_size=1,

    def hybrid_forward(self, F, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        if not self.same_shape:
            x = self.conv3(x)
        return F.relu(out + x)

class ResNet(nn.HybridBlock):
    def __init__(self, num_classes, verbose=False, **kwargs):
        super(ResNet, self).__init__(**kwargs)
        self.verbose = verbose
        with self.name_scope():
            net = self.net = nn.HybridSequential()
            net.add(nn.Conv2D(channels=32, kernel_size=3, strides=1, padding=1))
            for _ in range(3):
            net.add(Residual(channels=64, same_shape=False))
            for _ in range(2):
            net.add(Residual(channels=128, same_shape=False))
            for _ in range(2):

    def hybrid_forward(self, F, x):
        out = x
        for i, b in enumerate(self.net):
            out = b(out)
            if self.verbose:
                print('Block %d output: %s'%(i+1, out.shape))
        return out

def get_net(ctx):
    num_outputs = 10
    net = ResNet(num_outputs)
    net.initialize(ctx=ctx, init=init.Xavier())
    return net

To hybridize and infer model parameters automatically, do

net = get_net(ctx)
train(net, train_data, valid_data, num_epochs, learning_rate,
      weight_decay, ctx, lr_period, lr_decay)

The complete code is the following tutorial (it will be merged to English tutorial).


For other models, you may tweak the above code.


The “Residual” block there seems to only use Gluon’s built-in blocks and not have any custom parameters defined.

Here is part of the HybridBlock code I have:

class MyDenseLayer(gluon.HybridBlock):

    def __init__(self, numSlices, numInputs, numOutputs, batchSize, gamma=0.01, **kwargs):
        super(MyDenseLayer, self).__init__(**kwargs)
        with self.name_scope():
            W_bound = some_expression_involving_numInputs_and_numOutputs
            self.W = mx.sym.random_uniform(low=-W_bound, high=W_bound, shape=(numSlices, numOutputs, numInputs))
            self.b = mx.sym.zeros((numOutputs,))

            self.batchSize = batchSize
            self.gamma = gamma


class MyNetwork(gluon.HybridBlock):
    def __init__(self, input_shape=1, output_shape=1, dropout=0.7):
	... blah blah ....
        self.L0 = MyDenseLayer(self.nslices, numInputs, 256, 64, gamma=0.01)

net = MyNetwork()
  • Notice that W and b of MyDenseLayer are declared via mx.sym. How do I indicate that I need the gradients relative to these? (The same way attach_grad indicates for ND arrays)
  • How can we access the gradients of those and write the weight updating code? That is, what would be the way to do something like “net.L0.W = net.L0.W - learningRate * net.L0.W.grad”?
  • For “net.Hybridize()” to work, is it necessary that W and b are declared as symbols? Or, is it OK for them to be ND arrays?



Suppose you want to define a customized layer using gluon and wish to update its parameters.

from mxnet import nd
from mxnet.gluon import nn

class CenteredLayer(nn.Block):
    def __init__(self, **kwargs):
        super(CenteredLayer, self).__init__(**kwargs)

    def forward(self, x):
        return x - x.mean()

layer = CenteredLayer()

[-2. -1.  0.  1.  2.]
<NDArray 5 @cpu(0)>

To define a net:

net = nn.Sequential()
with net.name_scope():

We can verify its mean is 0:

y = net(nd.random.uniform(shape=(4, 8)))

[  2.32830647e-11]
<NDArray 1 @cpu(0)>

To access and update parameters and grads in the customized layer during training, you need gluon.Parameter()

from mxnet import gluon
my_param = gluon.Parameter("exciting_parameter_yay", shape=(3,3))
(my_param.data(), my_param.grad())

 [[ 0.02332029  0.04696382  0.03078182]
  [ 0.00755873  0.03193929 -0.0059346 ]
  [-0.00809445  0.01710822 -0.03057443]]
 <NDArray 3x3 @cpu(0)>,
 [[ 0.  0.  0.]
  [ 0.  0.  0.]
  [ 0.  0.  0.]]
 <NDArray 3x3 @cpu(0)>)

If you need HybridBlock, make changes correspondingly. The idea is the same.

More details:


See this tutorial: http://gluon.mxnet.io/chapter03_deep-neural-networks/custom-layer.html#Craft-a-bespoke-fully-connected-gluon-layer

You need to create parameter with self.params.get, not mx.sym.variable


Can I add something to this question?

Say that inside a custom HybridBlock, I have

self.fc = nn.Dense(…)

in the constructor, and then in hybrid_forward, I’d like to define a regularization term on the weights of self.fc.
How do I best access the weights as Symbol/NDArray, depending on what F is?

From the docs, it seems something like this could work:

if F.name == ‘mxnet.ndarray’:
wgts = self.fc.weights.data()
wgts = self.fc.weights.var()

But this seems ugly to me. How do this nicer?


hybrid_forward, I am defining some nn,Dense(


Unfortunately that’s how it works for now. You can make a wrapper for this.