Question about the example of symbol cunsom op


In this example:

It says: need_top_grad=False because softmax is a loss layer and you don’t need gradient input from preceding layers:

As far as I understand, this softmax op only normalizes the input data to the range between 0 and 1. The cross entropy loss is not calculated. So this op should not be a loss layer and the parameter need_top_grad should be set as True?


need_top_grad indicates whether the gradient from the top layer is required at the time of the backward pass. Since this layer is placed at the end of the network, the gradient from the top layer is not required. You are right that in this example the operator only computes the softmax but not the cross entropy.


Thanks for your reply.
I think there is a trick in this Softmax layer. In forward, it doesn’t calculate the cross entropy. However, in the backpropagation, the calculation of gradient has taken cross entropy into consideration.