TL;DR: From what I understand, you cannot use eval.metric with custom loss function.

There is a tutorial, which you have probably seen, which explains how to create custom loss: https://mxnet.incubator.apache.org/versions/master/tutorials/r/CustomLossFunction.html They never use even predefined metric like MSE or MAE. They explain, that if you use custom loss, then the output of the model is the gradient of loss with respect to the input data. So, to do a real prediction, they have to get last FC2 layer manually.

I tried to play with their code, to understand if custom or predefined metric can work with custom loss. To do so, I used the last part of the tutorial, where they use either predefined loss `MAERegressionOutput`

or custom loss, which has same logic as MAE loss: `lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))`

. Then I tried to use predefined MAE eval metric and compare the results.

**Test 1: Using predefined loss and metric**

My expectation is that if I do `eval.metric`

each epoch and then do manual calculation of MAE based on training data, I should receive similar results. Here is the code I use:

```
data(BostonHousing, package = "mlbench")
BostonHousing[, sapply(BostonHousing, is.factor)] <-
as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
BostonHousing <- data.frame(scale(BostonHousing))
test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
train.x = data.matrix(BostonHousing[-test.ind,-14])
train.y = BostonHousing[-test.ind, 14]
test.x = data.matrix(BostonHousing[--test.ind,-14])
test.y = BostonHousing[--test.ind, 14]
require(mxnet)
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
mx.set.seed(0)
metric = mx.metric.custom(feval = function(label, out) (sum(as.array(label) - as.array(out))^2 / length(as.array(label))), name = 'ssq')
model2 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 5,
array.batch.size = 80,
optimizer = "rmsprop",
verbose = TRUE,
array.layout = "rowmajor",
eval.metric = mx.metric.mae,
batch.end.callback = NULL,
epoch.end.callback = NULL)
internals = internals(model2$symbol)
fc_symbol = internals[[match("fc2_output", outputs(internals))]]
model3 <- list(symbol = fc_symbol,
arg.params = model2$arg.params,
aux.params = model2$aux.params)
class(model3) <- "MXFeedForwardModel"
pred2 <- predict(model2, train.x)
pred3 <- predict(model3, train.x)
sum(abs(train.y - pred2[1,])) / length(train.y)
sum(abs(train.y - pred3[1,])) / length(train.y)
```

If I run this code, I get the following output:

```
Start training with 1 devices
[1] Train-mae=0.712698568900426
[2] Train-mae=0.600305815537771
[3] Train-mae=0.450728197892507
[4] Train-mae=0.40242209037145
[5] Train-mae=0.395647222797076
...
> sum(abs(train.y - pred2[1,])) / length(train.y)
[1] 0.3761493
> sum(abs(train.y - pred3[1,])) / length(train.y)
[1] 0.3761493
```

As you can see the last two numbers are same, meaning that the output of the network and the output of FC2 is the same. They also quite well aligned with eval.metric output, though not exactly the same.

**Test 2: Using manually created loss**

My expectations that if I change loss to a custom one, but which is working the same, I should still have similar result. Here is my code and result (the difference only in using lro_abs):

```
data(BostonHousing, package = "mlbench")
BostonHousing[, sapply(BostonHousing, is.factor)] <-
as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
BostonHousing <- data.frame(scale(BostonHousing))
test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
train.x = data.matrix(BostonHousing[-test.ind,-14])
train.y = BostonHousing[-test.ind, 14]
test.x = data.matrix(BostonHousing[--test.ind,-14])
test.y = BostonHousing[--test.ind, 14]
require(mxnet)
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
mx.set.seed(0)
metric = mx.metric.custom(feval = function(label, out) (sum(as.array(label) - as.array(out))^2 / length(as.array(label))), name = 'ssq')
model2 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
ctx = mx.cpu(),
num.round = 5,
array.batch.size = 80,
optimizer = "rmsprop",
verbose = TRUE,
array.layout = "rowmajor",
eval.metric = mx.metric.mae,
batch.end.callback = NULL,
epoch.end.callback = NULL)
internals = internals(model2$symbol)
fc_symbol = internals[[match("fc2_output", outputs(internals))]]
model3 <- list(symbol = fc_symbol,
arg.params = model2$arg.params,
aux.params = model2$aux.params)
class(model3) <- "MXFeedForwardModel"
pred2 <- predict(model2, train.x)
pred3 <- predict(model3, train.x)
sum(abs(train.y - pred2[1,])) / length(train.y)
sum(abs(train.y - pred3[1,])) / length(train.y)
```

```
Start training with 1 devices
[1] Train-mae=0.696901251872381
[2] Train-mae=0.669727434714635
[3] Train-mae=0.780241707960765
[4] Train-mae=0.781373461087545
[5] Train-mae=0.788071354230245
...
> sum(abs(train.y - pred2[1,])) / length(train.y)
Error in pred2[1, ] : incorrect number of dimensions
> sum(abs(train.y - pred3[1,])) / length(train.y)
[1] 0.3761493
```

As you can see, the results for eval.metric are different. If you try to caclulate the metric manually based on model output, then it actually fails with “incorrect number of dimensions”. The way of finding FC2 manually and calculating the metric still works and gives the same result as the previous test.

From that I had to conclude, that eval.metric (custom or not custom) doesn’t work same with custom loss as with predefined loss. I am not sure if there is a way to make custom metric work, but if you can get to FC2 output, then it should be possible.