Mxnet C++ : multi-label classfier

Hello,

I’m training multi-label classification model using C++ API.
Let’s say I have N labels. Then each sample may have any combination of labels.
E.g (N=6):
sample1, 0,1,0,0,1,1 . # 1- label is assigned, 0 - not
sample2, 1,1,0,1,0,0

I use LogisticRegressionOutput as my final layer (I believe it has ‘cross entropy’ loss function)

Everything seems to be working fine on my initial tests when labels have only 0/1 values.
In my case there also can be a situation when we don’t know the label for a certain position. (E.g: activity wasn’t measured). So, for every position we have: 0,1 or ‘undefined’.

I want these ‘undefined’ values to be ignored in the training (e.g: I don’t want gradients to be pulled to 0 or some other value)

Any suggestions how to deal with this situation?
Can it be done without writing custom loss and activation functions? (e.g: what will happen if I use NAN for these undefined positions)?

Thanks,
Eugene

Hi @eugeneraush

Can you try to do the following:

Have a mask symbol that contains 1 for known labels and 0 for unknown labels.
Before passing the data through the LogisticRegressionOutput use sym.where to set the output of your network to NaN where your output is unknown:

Here is an example using the symbol API in Python which is close to the CPP package API

# Creating the symbols holding our data
x = mx.sym.var('x')
mask = mx.sym.var('mask')
label = mx.sym.var('label')

# Setting output to NaN when unknown
filtered_output = mx.sym.where(mask, x, label)
# Passing the updated symbols through the LogisticRegressionOutput
output = mx.sym.LogisticRegressionOutput(data=filtered_output, label=label)
# Binding the symbol shapes
exe = output.simple_bind(mx.cpu(), x=(4,1), label=(4,1), mask=(4,1))

# sample label of the form [1, 0, 0, Unknown]
d_label = np.array([1,0,0,None]).reshape((4,1))

# Corresponding mask
d_mask = d_label != None

# Sample network output
d_x = mx.nd.uniform(-3, 3,(4,1))

# Passing the data through the network
exe.forward(x=d_x, mask=nd.array(d_mask), label=nd.array(d_label))

exe.backward()
exe.grad_arrays[1]
[[-0.32012987]
 [ 0.91181   ]
 [ 0.6455507 ]
 [ 0.        ]]
<NDArray 4x1 @cpu(0)>
exe.arg_dict
{'mask': 
 [[1.]
  [1.]
  [1.]
  [0.]]
 <NDArray 4x1 @cpu(0)>, 'x': 
 [[2.4386644]
  [1.7087817]
  [0.8489599]
  [1.6334143]]
 <NDArray 4x1 @cpu(0)>, 'label': 
 [[ 1.]
  [ 0.]
  [ 0.]
  [nan]]
 <NDArray 4x1 @cpu(0)>}

As we see here, the gradient for the unknown label has been masked.

1 Like

Hi Thomas,

mx.sym.where is exactly what I need!
I’ve tested in in C++ and it works fine.

Thanks!
Eugene