One or two output neurons for binary classification?

Theoretically, it makes sense to use two output neurons - in the FashionMNIST example, we used 10 output neurons for 10 classes.

But for homework 5.1, when I set the number of output neurons to 2, and used gloss.LogisticLoss(), there is a shape mismatch…

Also in this discussion on StackExchange, using 1 neuron is preferred due to less model complexity

Usually for binary classification you output P(Y=1|X), so the network parameterizes a Bernoulli distribution. I’m not sure it makes sense to output two neurons, since you’d want to enforce the constraint that P(Y=0|X) = 1-P(Y=1|X), which makes the second output redundant.

1 Like