`argwhere` implementation

Hi Guys,
In numpy, there is a quite useful location function numpy.argwhere. I am just wondering whether it is possible to have this implemented in mxnet ndarray as well?
Thanks a lot!

Unfortunately, there is no out of the box implementation.

You could use CSRNDArray to get something very close, but you need to still write custom code to split its indptr into actual row indices.

I have created an implementation for you, but unfortunately that code won’t be hybridizable as it uses shape information. It also depends on using asscalar(), so it will be a sync call to CPU.

I don’t know if my implementation is faster than just converting your array to NumPy and then using argwhere, but, nevertheless, I re-implemented Code 1 example using MXNet and here it is:

# Convert to CSRNDArray
input_array = mx.nd.array([[2, 0, 7],
                           [0, 5, 9]])

sparse_array = input_array.tostype('csr')
# Get non-zero column indices of the data
col_indices = sparse_array.indices

# Get non-zero row indices, using indptr, which returns offset to non-zero value per row 
row_indices = mx.nd.concat(
    *[mx.nd.repeat(row_num, repeats=int((current - prev).asscalar()))
      for prev, current, row_num in
      zip(sparse_array.indptr[0:-1], sparse_array.indptr[1:],
          mx.nd.arange(0, input_array.shape[0]))], dim=-1).astype(np.int64)

# combine row indices and col indices
non_zero_indices = mx.nd.concat(*[row_indices.expand_dims(axis=1), col_indices.expand_dims(axis=1)],


If I execute it, the print outputs me the same result as NumPy version.

[[0 0]
 [0 2]
 [1 1]
 [1 2]]
<NDArray 4x2 @cpu(0)>
1 Like

Thanks a lot! Your implementation is really helpful! That do you mean by “a sync call to CPU” here?

All MXNet operations run in async mode (see system architecture), meaning as soon as you fire an operation (both on CPU or GPU) MXNet returns control back to your code and process the operation in another C++ thread. For explicitly requesting the results, in MXNet you use one of the so-called “sync methods”, which basically hold the execution of the program until the result is available. The most common methods are asnumpy(), asscalar().

Thus, I meant that if you wanted to get the advantage of GPU speed here, it won’t be as fast as you would expect, because there is a sync call (asscalar()) used in the code: at that point data on GPU should be synced with CPU. If you run this code on CPU you won’t notice any difference, but running it on GPU will be not as efficient as it could be if there were a native implementation.

1 Like