Hi there.
I need to build a mask symbol out of a list of indexes.
For example given a set of indexes idxs:
idxs=[[1,2],[2,3],[0,1]]
then I want a mask like the following
mask= [[0,1,1,0],[0,0,1,1],[1,1,0,0]].
where idxs has shape [batch_size, num_indexes]
and mask has shape [batch_size, max_values]
At the moment I am using the following:
mask = mx.sym.sum(
mx.sym.one_hot(idxs, depth=max_values,
axis=1)
Unfortunately, this takes a lot of gpu memory. It scales as batch_size * max_values * num_indexes. I was wondering if anyone here has some ideas on how to do this in a more efficient way.
import mxnet as mx
# indicates which indicies and data relate to which rows
indptr = mx.nd.array([0, 2, 4, 6])
## row 0 is 0:2 from indices and data
## row 1 is 2:4 from indices and data
## row 2 is 4:6 from indices and data
# same as your `idxs` but flattened
indices = mx.nd.array([1, 2, 2, 3, 0, 1])
# all 1s in your example
data = mx.nd.array([1, 1, 1, 1, 1, 1])
a = mx.nd.sparse.csr_matrix((data, indices, indptr), shape=(3, 4))
a.asnumpy()
Hi @thomelane, thanks for the reply. Unfortunately sparse array are not an option for me as we are running on gpus and I believe they are not supported for gpus.
Hi,
Thank you for the replies!
I need to mask a large softmax layer. I am doing policy gradient but not all the options are always available so I am masking the options that are not available. In practice, I implemented a numerically stable softmax that returns non-zero probabilities only for the indeces contained in the idxs symbol as in the example.
I believe sparse arrays are the right way to do it, but given that there is limited support I am not super-keen in following this path.
Hi, I think it is something much simpler. I have a bunch of options and contexts. My model should output the best options given some context. I know at training time that some of the options are not available for some of the contexts. So I mask the output of my softmax to return non-zero probabilities only for the available options.
Does this make sense?
I see. Would it be helpful if MXNet support elementwise multiplication of csr * dense = csr on GPU?
In this way you only need a sparse mask, and get a sparse output.