I know that SoftmaxCrossEntropyLoss(from_logits = False) does apply softmax to our linear layer and then computes the cross-entropy loss, which perfectly makes sense. And if we want to pass a softmax applied layer, we can set from_logits = True, and everything would work fine.
My question is why from_logits is False by default? Wouldn’t it be better if it’s True by default so that every time I make a prediction(using model having softmax layer as output layer), it provides probabilities instead of linear predictions?
I know I can do this by just setting from_logits = True in the loss and then train it, but is there any important reason behind? If any?
BTW Pytorch also does the same as MXNet. So there’s gotta be a legit and intuitive reason.