I have a probability model where P(y = 1) = sigmoid( -|| xY ||^2 + c ). I figured I could pretty easily use MXNet to run the maximum likelihood optimization, since it’s essentially logistic regression, except instead of a dot product, you have something like Mahanalobis distance (trying to optimize parameters Y and c given data x,y). I tried something like:

```
```data = mx.sym.Variable(“data”)

target = mx.sym.Variable(“target”)

Y = mx.sym.Variable(“weight”)

fc = mx.sym.FullyConnected(data=data, weight=Y, no_bias=True, num_hidden=10)

bias = mx.sym.Variable(“bias”)

norm = -mx.sym.sum(mx.sym.square(fc)) + bias

out = mx.sym.LogisticRegressionOutput(data=norm, label=target)

model = mx.mod.Module(symbol=out, data_names=[‘data’], label_names=[‘target’])

but there are two problems. (1) I cannot use a batch size greater than 1, even if I force bias to have shape (1,) and (2) after training for any number of epochs, the resulting weights are all nan. Am I making a simple mistake here? Is it possible to use MXNet to optimize a function like this?