Bad prediction accuracy for FeedForward classifier in R

Hi everyone,

I am new to mxnet. I am working with a dataset for which I know that the model and the data allow a good classification.

I observe the following phenomenon: I am training a simple feed forward network using mx.model.FeedForward.create reaching an accuracy in the 80s. If I apply the returned model to the traning data and check the confusion matrix as well as the accuracy, it’s just about 3%.
I tried the mxnet examples (on to assure that my installation in R works correclty.

I’ve searched the web a lot and tried all different options but can’t find what the issue is. Hopefully somebody can help.

Thanks in advance for your replies!

I am pasting the code that I am using here. Is there any way to attach some files? I’ve got an .r-file and an .RData so that people could try to redo the example on their own:

# optional. Just for visualization of confusion matrix

nn.layer.input <- mx.symbol.Variable('data')
nn.layer.fc_3 <- mx.symbol.FullyConnected(data = nn.layer.input, num_hidden = 4)
NN_model <- mx.symbol.SoftmaxOutput(data = nn.layer.fc_3)

devices <- mx.cpu()
model <- mx.model.FeedForward.create(NN_model,
                                     X = data.x.train,
                                     y = data.y.train,
                                     ctx = devices,
                                     array.batch.size = 32,
                                     optimizer = "adam",
                                     begin.round = 10,
                                     num.round = 50,
                                     learning.rate = 1e-3,
                                     eval.metric = mx.metric.accuracy)

prob.y.predict <- predict(model, data.x.train, array.batch.size = 1)
y.predict <- max.col(t(prob.y.predict))

table(y.predict, data.y.train)  #Here you can already see that there is too much of confusion for being in the 80s

# optional if caret is available
confusionMatrix(factor(y.predict, 1:4), factor(data.y.train,1:4))

Update: I’ve figured out that mxnet apparently requires the levels of the factors to be starting with 0.
In my case, the values of data.y.train are from the set 1,2,3,4. If I change the code such that data.y.train is in the range 0 to 3 and modify the line y.predict <- max.col(t(prob.y.predict)) to y.predict <- max.col(t(prob.y.predict)) - 1 everything works.

Could it be that there is a “bug” in mxnet? I’d expect that having levels from 1 to 4 should behave the same as 0 to 3.

MXNet R doesn’t change the 0-index convention from the underlying C++ library. This applies to other operators such as the mx.symbol.slice assuming that would expect begin=0 as entry parameter to capture first entry of an array many aggregations operators like mx.symbol.sum where the dims or axis 0 refers to the first axis (typically the batch size dimension).

For the SoftmaxOutput, the loss will be calculatd assuming the prediction associated with label Y is at index Y. But as mxnet is in a 0-index world, if 4 is passed as Y, its associated prediction would be that of the 5th element of the prediciton (not present if last FC layer output is size 4).

So I wouldn’t say a bug, but I agree that 0-index and row-major conventions can bring some confusion from R perspective.

Thank you for clarifying this. I knew that mxnet might have a different data handling than the language calling it. Since there is a nice data abstraction in R, I thought/hoped that the mxnet functions in R would automatically encode/decode the factors appropriately.

I agree that one shouldn’t consider it a bug. However, I haven’t seen the documentation pointing that out either (except data alignment).