Softmax block in Gluon


The doesn’t explain how to get a “pure” Softmax block in a Gluon model.

“Note, we didn’t have to include the softmax layer because MXNet’s has an efficient function that simultaneously computes the softmax activation and cross-entropy loss. However, if ever need to get the output probabilities,”

it seems like the last sentence is not complete. Is the only way to build a custom block and use F.softmax(x) in the hybrid_forward method?


>>> import mxnet as mx
>>> from mxnet import ndarray as nd
>>> x = nd.array([1, 2, 3, 4])
>>> y = nd.softmax(x)
>>> y

[ 0.0320586   0.08714432  0.23688284  0.64391428]
<NDArray 4 @cpu(0)>
>>> sum(y)

[ 1.]
<NDArray 1 @cpu(0)>


Thanks astonzhang!

I was actually looking for an existing softmax block to use in Gluon models. In the meantime I am using a little custom one:

class Softmax(HybridBlock):

def __init__(self, **kwargs):
     super(Softmax, self).__init__(**kwargs)

def hybrid_forward(self, F, x):
    return F.softmax(x)


@avolozin this seems useful enough to be directly added to gluon.
Would be be ok with creating a PR request?


Thanks @madjam! I will need to checkout the code and setup a dev env - might have time toward the end of this week, quite swamped now :frowning:

Another option would be to add ‘softmax’ as a new type in mxnet.gluon.nn.Activation. What do you think?


I like that idea. Keras does this in a similar fashion.


Thanks! I’ll ping you when I get some time to implement this