Hello,
I’m training multi-label classification model using C++ API.
Let’s say I have N labels. Then each sample may have any combination of labels.
E.g (N=6):
sample1, 0,1,0,0,1,1 . # 1- label is assigned, 0 - not
sample2, 1,1,0,1,0,0
…
I use LogisticRegressionOutput as my final layer (I believe it has ‘cross entropy’ loss function)
Everything seems to be working fine on my initial tests when labels have only 0/1 values.
In my case there also can be a situation when we don’t know the label for a certain position. (E.g: activity wasn’t measured). So, for every position we have: 0,1 or ‘undefined’.
I want these ‘undefined’ values to be ignored in the training (e.g: I don’t want gradients to be pulled to 0 or some other value)
Any suggestions how to deal with this situation?
Can it be done without writing custom loss and activation functions? (e.g: what will happen if I use NAN for these undefined positions)?
Thanks,
Eugene
Hi @eugeneraush
Can you try to do the following:
Have a mask
symbol that contains 1
for known labels and 0
for unknown labels.
Before passing the data through the LogisticRegressionOutput use sym.where
to set the output of your network to NaN where your output is unknown:
Here is an example using the symbol API in Python which is close to the CPP package API
# Creating the symbols holding our data
x = mx.sym.var('x')
mask = mx.sym.var('mask')
label = mx.sym.var('label')
# Setting output to NaN when unknown
filtered_output = mx.sym.where(mask, x, label)
# Passing the updated symbols through the LogisticRegressionOutput
output = mx.sym.LogisticRegressionOutput(data=filtered_output, label=label)
# Binding the symbol shapes
exe = output.simple_bind(mx.cpu(), x=(4,1), label=(4,1), mask=(4,1))
# sample label of the form [1, 0, 0, Unknown]
d_label = np.array([1,0,0,None]).reshape((4,1))
# Corresponding mask
d_mask = d_label != None
# Sample network output
d_x = mx.nd.uniform(-3, 3,(4,1))
# Passing the data through the network
exe.forward(x=d_x, mask=nd.array(d_mask), label=nd.array(d_label))
exe.backward()
exe.grad_arrays[1]
[[-0.32012987]
[ 0.91181 ]
[ 0.6455507 ]
[ 0. ]]
<NDArray 4x1 @cpu(0)>
exe.arg_dict
{'mask':
[[1.]
[1.]
[1.]
[0.]]
<NDArray 4x1 @cpu(0)>, 'x':
[[2.4386644]
[1.7087817]
[0.8489599]
[1.6334143]]
<NDArray 4x1 @cpu(0)>, 'label':
[[ 1.]
[ 0.]
[ 0.]
[nan]]
<NDArray 4x1 @cpu(0)>}
As we see here, the gradient for the unknown label has been masked.
1 Like
Hi Thomas,
mx.sym.where is exactly what I need!
I’ve tested in in C++ and it works fine.
Thanks!
Eugene