Mod.score inconsistency between LogisticRegressionOutput and SoftmaxOutput


#1

Redirected (twice) from github bug report. I still think it is a bug:

TL;TR:

LogisticRegressionOutput (with 1 fc) and SoftmaxOutput (with 2 fcs) are mostly interchangeable in training codes. However, mod.score output fails on LogisticRegressionOutput. Original problem discovered while following the example from https://xiandong79.github.io/MXNet-Logistic-Regression-Example


#2

Hi @yifeim,

You are using the wrong metric. you don’t want to compute the accuracy as a logistic regression would, but you want to compare whether the output is above or below 0.5.
Here is the fixed code:

sym = mx.sym.LogisticRegressionOutput(
    mx.sym.FullyConnected(data=mx.sym.Variable('data'), num_hidden=1, name ='fc'),
    label=mx.sym.Variable('softmax_label')
)
train_iter = mx.io.NDArrayIter(data, label, 100,)
mod = mx.mod.Module(symbol=sym)
metric = mx.metric.CustomMetric(lambda label,y: np.mean((y[:,-1]>.5)==label), name='Real-Accuracy')
mod.fit(train_iter, eval_metric=metric, num_epoch=5,
        optimizer='adam', optimizer_params={'learning_rate':1.})
print(mod.score(train_iter, metric))
print(np.mean((mod.predict(train_iter).asnumpy()[:,-1]>.5)==label))
INFO:root:Epoch[0] Train-Real-Accuracy=0.360000
INFO:root:Epoch[0] Time cost=0.002
INFO:root:Epoch[1] Train-Real-Accuracy=0.770000
INFO:root:Epoch[1] Time cost=0.001
INFO:root:Epoch[2] Train-Real-Accuracy=0.950000
INFO:root:Epoch[2] Time cost=0.001
INFO:root:Epoch[3] Train-Real-Accuracy=0.990000
INFO:root:Epoch[3] Time cost=0.001
INFO:root:Epoch[4] Train-Real-Accuracy=0.970000
INFO:root:Epoch[4] Time cost=0.001

[('Real-Accuracy', 0.96999999999999997)]
0.97

#3

Hi Thomas,

Thanks for looking into it. I am aware of CustomMetric and mx.metric.np, but I do not see the argument why ‘acc’ is a wrong metric. It is quite confusing seeing that ‘acc’ works with SoftmaxOutput but not with LogisticRegressionOutput. It is also very confusing that ‘CrossEntropy’ metric works fine with LogisticRegressionOutput but throws errors on SoftmaxOutput.

At a deeper level, a Loss module outputs a prediction in a forward pass and the gradient to the actual loss in the backward pass – quite unintuitive. So, mod.score has to convert the prediction to an actual loss. While the conversion works fine with softmax prediction, it fails with logisticregression. All of these look like a bug to me.


#4

Btw, while I totally appreciate your solution and kind attention. What I really meant was that having to write a custom loss function for binary classification seems quite substandard to me.


#5

@yifeim

I see. When you are using a sigmoid output, your network is outputting a single scalar, the shape match with the label so looking at the Accuracy metric code there is a conversion to int32 and then comparison with the label. The conversion to int32 is actually a floor() operation. Whilst if it was a round(), it would solve your issue.

edit: I agree with your more high-level remark. That is why I prefer using the Gluon API and stay away from fit, in Gluon you can easily decouple the output of your network with the loss computation. And do the evaluation as you see fit


#6

Awesome. This is some great inside knowledge. How should I take it further? Or would you help me with a quick check in? I am trying to do gluon for future experiments. I currently depend on module for sparse variables. Hopefully that gets carried over soon:)


#7

Sure feel free to post more questions on the forum :smile:
@eric-haibin-lin is working on a proposal to implement sparse tensor in Gluon. You can see his proposal here: https://cwiki.apache.org/confluence/display/MXNET/Gluon+Sparse+Support

Gluon resources:
Crash course: http://gluon-crash-course.mxnet.io/
Tutorials: http://mxnet.incubator.apache.org/tutorials/index.html


#8

To elaborate on your initial question: but I do not see the argument why ‘acc’ is a wrong metric
You are using a LogisticRegressionOutput which is suited for regression tasks with the Accuracy metric which is suited to classification tasks, hence the mismatch, unexpected output, and need for explicit conversion between network output and label format.


#9

Thanks, but I still disagree with the explanation.

Logistic regression always outputs a value between [0,1]. This value, p, is often interpreted as the probability for the positive class in binary classification. Correspondingly, 1-p is the probability for the negative class.

Training logistic regression models is similar to training multi-class softmax models, in that it also assumes a (binary) cross entropy objective:

min -y*log(p)-(1-y)*log(1-p)

I think using logistic regression for binary classification is expected. In fact, sklearn API supports crossentropy and accuracy for both logistic regression and multi-class softmax. Examples:

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html

To include both features, sklearn actually identifies them separately through shape[1]. Using np.round for accuracy is a nice trick, but may not be the best solution for clarity purposes.