Activation function with learnable parameter (Solved)


#1

Hi, I’m very new to MXNet & Gluon, I have been trying to understand how to write an activation function with a learnable parameter. Does anyone have a PReLU-like code example they can share with me to accelerate my understanding?

Many thanks in advance!


#2

Does this help? http://gluon.mxnet.io/chapter03_deep-neural-networks/custom-layer.html


#3

Thanks! So if I got this right. Let me try to add a “alpha” parameter to the CenteredLayer example listed:

class CenteredLayer(Block):
    def __init__(self, **kwargs):
        super(CenteredLayer, self).__init__(**kwargs)

    def forward(self, x):
        return x - nd.mean(x)

Let's say I want a parameter "alpha "to start at 1.0, then be learned so that it returns = x - alpha*nd.mean(x)

Is the following correct?

class CenteredAlpha(Block):
    def __init__(self, units, in_units=0, **kwargs):
        super(CenteredAlpha, self).__init__(**kwargs)
        with self.name_scope():
            self._units = units
            self._in_units = in_units
            #################
            # We add the required parameters to the ``Block``'s ParameterDict
            #################
            self.alpha = self.params.get(
                'alpha', init=mx.initializer.One(),
                shape=(in_units, units))


    def forward(self, x):
        with x.context:
            return x - self.alpha.data()*nd.mean(x)

One more quick question, how do I print the alphas for the above definition in order to check that that are evolving? Sorry I’m new to this and still a bit confused


#4

I wrote the following test which is closer to what I’m trying to do which is to have a learnable parameter in a custom activation function.

I created “Prlu” activation function (like PReLU, but just a simple example). Basically I want alpha to start with a value (1.0 in this example) and then be learned from there…

class Prlu(Block):
    def __init__(self, units, **kwargs):
        super(Prlu, self).__init__(**kwargs)
        with self.name_scope():
            self.units = units
            #################
            # We add the required parameters to the ``Block``'s ParameterDict
            # want alpha set to 1.0, then learned after that.
            #################
            self.alpha = self.params.get(
                'alpha', init=mx.initializer.One(),
                shape=(units,))

    def forward(self, x):
        return max(x, self.alpha.data()*x)

Then create a net:

net = gluon.nn.Sequential()
with net.name_scope():
    #conv 1 & 2
    net.add(gluon.nn.Conv2D(channels=K1, kernel_size=3))
    net.add(Prlu(K1))
    net.add(gluon.nn.Conv2D(channels=K2, kernel_size=3))
    net.add(Prlu(K2))
etc...

net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

It seems the above will init all parameters including my alphas (though I prefer they start at 1.0,
and let the weights and bias get xavier initialized:

Checking my parameters they seem ok…
print(net.collect_params())
yields the following…

sequential0_ (
Parameter sequential0_conv0_weight (shape=(32, 0, 3, 3), dtype=<class ‘numpy.float32’>)
Parameter sequential0_conv0_bias (shape=(32,), dtype=<class ‘numpy.float32’>)
Parameter sequential0_prlu0_alpha (shape=(32,), dtype=<class ‘numpy.float32’>)
Parameter sequential0_conv1_weight (shape=(32, 0, 3, 3), dtype=<class ‘numpy.float32’>)
Parameter sequential0_conv1_bias (shape=(32,), dtype=<class ‘numpy.float32’>)
Parameter sequential0_prlu1_alpha (shape=(32,), dtype=<class ‘numpy.float32’>)

When I start my training I get:
[15:45:50] /Users/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [15:45:50] src/operator/tensor/./elemwise_binary_broadcast_op.h:66: Check failed: l == 1 || r == 1 operands could not be broadcast together with shapes (32,) (64,32,26,26)

The simple Mnist code gets no errors with an activation function that uses a straightforward calculation,

I get that I seem to be mixing my parameters, but I can’t see a way to un-confuse my newbie mind. Any guidance?


#5

You are trying to multiply a (32,) vector with a (64, 32, 26, 26) tensor, which doesn’t work.

To use broadcasting, you need to define your alpha to have shape (32, 1, 1)


#6

I changed the shape of alpha as pilswrong suggested, but it seems I have not done it correctly. But I’m unclear how to fix and I seek advice.

class Prlu(Block):
    def __init__(self, units, **kwargs):
        super(Prlu, self).__init__(**kwargs)
        with self.name_scope():
            self.units = units
            #################
            # We add the required parameters to the ``Block``'s ParameterDict
            # want alpha set to 1.0, then learned after that.
            #################
            self.alpha = self.params.get(
                'alpha', init=mx.initializer.One(),
                shape=(units,1,1,))
                
    # __subex will become a more complicated formula
    def __subex(self, x):
        return ((x>=0)+(x<0)*self.alpha.data())

    def forward(self, x):
        return (x*self.__subex(x))

When I define the network in the way it makes sense to me by specifying the shape
of the fully connected layer “net.add(Prlu(128))”, but I get:

Traceback (most recent call last):
  File "gluon-mnist-v5.py", line 145, in 
    loss = softmax_cross_entropy(output, label)
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/gluon/block.py", line 290, in __call__
    return self.forward(*args)
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/gluon/block.py", line 474, in forward
    return self.hybrid_forward(ndarray, x, *args, **params)
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/gluon/loss.py", line 314, in hybrid_forward
    loss = -F.pick(pred, label, axis=self._axis, keepdims=True)
  File "", line 76, in pick
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Shape inconsistent, Provided=(64,), inferred shape=(128,)

but then when a define “net.add(Prlu(64))” and I get:

Traceback (most recent call last):
  File "gluon-mnist-v5.py", line 144, in 
    output = net(data)
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/gluon/block.py", line 290, in __call__
    return self.forward(*args)
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/gluon/nn/basic_layers.py", line 50, in forward
    x = block(x)
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/gluon/block.py", line 290, in __call__
    return self.forward(*args)
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/gluon/block.py", line 474, in forward
    return self.hybrid_forward(ndarray, x, *args, **params)
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/gluon/nn/basic_layers.py", line 201, in hybrid_forward
    flatten=self._flatten, name='fwd')
  File "", line 74, in FullyConnected
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/Users/bc/mxnet/lib/python3.6/site-packages/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Shape inconsistent, Provided=(10,8192), inferred shape=(10,4096)

#7

I have found the correct way to write this function. The intent is to make a much more interesting function, but started simple by trying to replicate a learnable leaky relu (prelu) in Gluon. It is simplistic and not optimal but I present it here so that others may learn from it:

class Prlu(Block):
    def __init__(self, units, **kwargs):
        super(Prlu, self).__init__(**kwargs)
        with self.name_scope():
            self.units = units
            #################
            # creating own activation function  Prlu, like prelu = max (x, alpha*x)
            # we want learnable alpha to start at 0.25
            # We add the alpha parameter to the ``Block``'s ParameterDict
            # this was created as a test so we can make a more complicated functiuon
            #################
            self.alpha = self.params.get(
                'alpha', init=mx.initializer.Constant(0.25),
                shape=(units,))
                
    # __subex not needed for this example, but included here future work
    # There is one learnable alpha for every node in a layer but
    # pre-activation tensor may be like the following (batch, node, x, y)
    # we need alpha(node) to be able to broadcast , reshape to alpha(1, node, 1, 1) 
    def __subex(self, x):
        new_shape = [1]*len(x.shape)
        new_shape[1] = self.alpha.shape[0]
        aa = self.alpha.data().reshape(new_shape).broadcast_to(x.shape)
        return ((x>=0)+(x<0)*aa)

    def forward(self, x):
        return (x*self.__subex(x))