Multiple losses

ah I see ;). meanwhile I managed to run several train iterations. first loss fine so far. what I observed is that for the second loss, the optimization is strange. strange in the following way:
let’s assume I have a batch_size of b with an input dimension of n, ie. cols:8, rows: n for the input NDArray x. in hybrid_forward(self, F, x) I binarize the input x, such that we have b samples of size n, with n(i) either 0/1. what the second loss should minimize is the mean over the number of bits, s.t. the bits will be evenly distributed (across the samples, e.g. first bit(col) is 1 for the first 4 samples, 0 for the other 4). what I get when inspecting the output (say for 8 samples, n=8) is that the cols are
is that a problem because of how the backpropagation in the MXNet backend works, ie. it assumes a loss vector with size (batch_size, )?

code hybrid_forward(self, F, x, **kwargs):

y = mx.nd.sign(x)
b = 0.5 * (y + 1)
mu_n = F.mean(b_n, axis=0)
loss = F.square(mu_n - 0.5)

To further validate this, I’m overfitting to the data (only with second loss) by only feeding in 8 samples (equalling batch_size) repeatedly