On using SoftmaxOutput with single output

When using softmaxOutput layer, if the param multi_output is set to be False, the input data and output data will be transfered into a 2-D tensor:
Tensor<xpu, 2, DType> in_data[softmaxfocalout_enum::kData].get_with_shape<xpu, 2, DType>(s2, Tensor<xpu, 2, DType> out_data[softmaxfocalout_enum::kOut].get_with_shape<xpu, 2, DType>(s2, s);
And then, Softmax(out, data) is used.
My question is:

  1. Why should use softmax function but not a sigmod function, which usually outputs a single value.
  2. Why is the label’s shape Tensor<xpu, 1, DType>, but not Tensor<xpu, 2, DType>, the same shape as input data?
  3. I would like to to use it in a multi-label image classification. Normally, the format of the image data is BCHW, and the 2D tensor’s shape should be B-CHW, which may leads to a speed problem: In backward, the GridDim may be too small (the batch is small).
  • Sigmoid is an activation function. It transforms values between -inf and inf to values between 0 and 1. Softmax transforms an array of floats into a probability distribution summing up to 1. They are completely different things. You can use Sigmoid as a loss function for binary classification problems. You need to do crossentropy on top of softmax to use it as a loss function. softmax_cross_entropy is a commonly used loss function for multi-class classification.

  • The label shape is 1D because it is easier to specify the indices to classes as labels. Example: [3, 5, 1] means the first example belongs to class 3, the second belong to class 5 and the third belongs to class 1. If label is 2D, this must be written as one hot array like [ [0,0,0,1,0,0], [0,0,0,0,0,1], [0,1,0,0,0,0] ]. If you have a 2D label, you can use argmax function to make it 1D. That said it will be nice to have both options like how gluon’s SoftmaxCrossEntropyLoss does it.

I’m not sure why there will be a speed problem. Could you please explain?