I am a bit confused about which labels MXNet is expecting in a binary classification context.

In my problem, I have a dependent variable which looks like an array of 1s and 0s i.e. `[1,0,0,0,1,1….0,0,0,1]`

.

In numpy terms, its shape is (n_data_points,).

Given that, my last 2 layers in the model are defined as follows:

fc2 = mx.symbol.FullyConnected(data = fc1bn, name=‘fc2’, num_hidden=1)

mlp = mx.symbol.LogisticRegressionOutput(data = fc2, name = ‘softmax’)

This works perfectly.

Thing is, this works as well

fc2 = mx.symbol.FullyConnected(data = fc1bn, name=‘fc2’, num_hidden=2)

mlp = mx.symbol.SoftmaxOutput(data = fc2, name = ‘softmax’)

whilst I would have expected the above to work only if the dependent variable was one-hot-encoded, i.e. `[[1,0],[0,1],[0,1],[0,1],[1,0],…[0,1],[1,0]]`

, or again, in numpy terms, shaped as (n_data_points,2).

Apparently `SoftmaxOutput`

is smart enough to spit out a probability and return argmax at the same time.

Now, the question is, is there a recommended way of structuring a binary classification problem?

Shall one use a one-hot-encoded variable or not?

Knowing that `LogisticRegressionOutput`

and `SoftmaxOutput`

do exactly the same thing in a binary context, which one is recommended?