Error with softmax in keras-mxnet

I am using the MXNet backend of Keras to train the Isensee segmentation network found here:

Since I have 7 classes, I want to apply the softmax activation in the output layer, but I get the following error:
RuntimeError: simple_bind error. Arguments: /input_11: (8, 1, 64, 64, 64) /softmax_1_target1: (8, 7, 64, 64, 64) /softmax_1_sample_weights1: (8,) Error in operator broadcast_mul14: [16:48:16] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\tensor\./elemwise_binary_broadcast_op.h:68: Check failed: l == 1 || r == 1: operands could not be broadcast together with shapes [8,7,64,64] [8]

My input_shape is (1,64,64,64) and n_labels is 7. I have tried several options, all with the same result:
activation_block = Activation(activation_name)(output_layer)
activation_block = Conv3D(n_labels, kernel_size=(1,1,1), activation="softmax")(output_layer)
activation_block = Softmax(axis=1)(output_layer)
where the shape of the output_layer is (None, 7, 64, 64, 64).

Putting activation_name to “sigmoid” does work, but seems less logical to me since I have a multiclass problem.

Is this a bug or am I doing something wrong?

you should add a dense layer first, our specific the axis applying softmax
softmax thought that your label’s shape is 8764*64, and label in range(64) is True, which is not what you really want.
using reshape, specific the axis applying softmax, or manually add a dense layer may help.

(I use keras only with tensorflow/CNTK as a backend, for MXNet, I just use the gluon model)
(So I am not shure the strategy I told you really works)

Thank you for the reply! This is a fully convolutional network which should output volumetric predictions of shape (7,64,64,64), so I don’t really want to use a dense layer (which will flatten my output). The error stays the same when I specify the axis to apply the softmax:
activation_block = Softmax(axis=1)(output_layer)
I can compile the network, but it gives the error message mentioned above when fitting.

I have just discovered that activation_block = Softmax(axis=1)(output_layer) does work in combination with my custom Dice loss function, but that the error occurs when adding K.categorical_crossentropy(y_true, y_pred) to my Dice loss.

In the definition of my loss function, I can either use:
return 1 - dice
return K.categorical_crossentropy(K.reshape(y_true, (y_true.shape[0], y_true.shape[1], 64*64*64)), K.reshape(y_pred, (y_pred.shape[0], y_pred.shape[1], 64*64*64)))

but not the sum:

return 1 - dice + K.categorical_crossentropy(K.reshape(y_true, (y_true.shape[0], y_true.shape[1], 64*64*64)), K.reshape(y_pred, (y_pred.shape[0], y_pred.shape[1], 64*64*64)))

Which is what I would like to do. Any suggestion how this can be solved?

what about reshape before Softmax?
AFAIK categorical_crossentropy works for y_pred=(batch_size,label_size) and y_true=(batch_size,1)
so a reshape function transform y_pred=(1,64,64,64,7) to y_pred=(1*64*64*64,7) may help
maybe you could try activation_block = Softmax()(output_layer.reshape((-1,7))) with label=label.reshape((-1,))

First of all, I don’t really want to reshape my output before the activation function, since I need volumetric data. I guess it’s also not really necessary, since I can use activation_block = Softmax(axis=1)(output_layer) in combination with the categorical_crossentropy loss.

However, I have noticed that there is a huge difference in loss value if I use
K.categorical_crossentropy(y_true, y_pred) (initial values around 30)


K.categorical_crossentropy(K.reshape(y_true, (y_true.shape[0], y_true.shape[1], 64*64*64)), K.reshape(y_pred, (y_pred.shape[0], y_pred.shape[1], 64*64*64))) (values around 1e7).

I also see that you use notations for channels_last while I’m using channels_first.

Is it possible to specify an axis when calculating categorical_crossentropy loss?

I was able to give the axis argument to the categorical_crossentropy function, this works fine:
K.categorical_crossentropy(y_true, y_pred, axis=1)

However, I still have the same issue with my custom loss function. This is its definition:

def combined_catcrossentropy_total_dice_loss(y_true, y_pred):
smooth = 1	
weights = [0.00080562, 0.00068257, 0.0031111, 0.22452312, 0.1969243, 0.06613467, 0.50781862]
dice = 0
for index in range(7):        
    y_true_f = K.flatten(y_true[:,index,:,:,:])
    y_pred_f = K.flatten(y_pred[:,index,:,:,:])
    intersection = K.sum(y_true_f * y_pred_f)
    denom = K.sum(y_true_f + y_pred_f)		
    dice += weights[index] * (2. * intersection + smooth) / (denom + smooth)
return 1 - dice + K.categorical_crossentropy(y_true, y_pred, axis=1)

I always presumed that categorical_crossentropy returns a value, not a tensor, such that it could be added to 1-dice. Apparently, this is not the case. Is there a way to solve this?

using .mean()?
or .mean().asscalar() if you want a scalar rather than a NDArray object

This did indeed fix my problem, thank you very much! So this is the loss function that I’m using now:
1 - dice + K.mean(K.categorical_crossentropy(y_true, y_pred, axis=1))

Just out of curiosity: how does Keras handle the categorical_crossentropy usually if it returns a tensor? Does it internally calculate the mean as loss value?

I think MXNet try to figure in_grad and out_grad respectively.
So if you send a tensor of length n, MXNet may give you a gradient of all the n results.
MXNet will handle the rest of the part. So you won’t worry what will happen if you send a tensor.

1 Like