In numpy, there is a quite useful location function
numpy.argwhere. I am just wondering whether it is possible to have this implemented in mxnet ndarray as well?
Thanks a lot!
Unfortunately, there is no out of the box implementation.
You could use CSRNDArray to get something very close, but you need to still write custom code to split its
indptr into actual row indices.
I have created an implementation for you, but unfortunately that code won’t be hybridizable as it uses shape information. It also depends on using
asscalar(), so it will be a sync call to CPU.
I don’t know if my implementation is faster than just converting your array to
NumPy and then using
argwhere, but, nevertheless, I re-implemented Code 1 example using MXNet and here it is:
# Convert to CSRNDArray input_array = mx.nd.array([[2, 0, 7], [0, 5, 9]]) sparse_array = input_array.tostype('csr') # Get non-zero column indices of the data col_indices = sparse_array.indices # Get non-zero row indices, using indptr, which returns offset to non-zero value per row row_indices = mx.nd.concat( *[mx.nd.repeat(row_num, repeats=int((current - prev).asscalar())) for prev, current, row_num in zip(sparse_array.indptr[0:-1], sparse_array.indptr[1:], mx.nd.arange(0, input_array.shape))], dim=-1).astype(np.int64) # combine row indices and col indices non_zero_indices = mx.nd.concat(*[row_indices.expand_dims(axis=1), col_indices.expand_dims(axis=1)], dim=1) print(non_zero_indices)
If I execute it, the
[[0 0] [0 2] [1 1] [1 2]] <NDArray 4x2 @cpu(0)>
Thanks a lot! Your implementation is really helpful! That do you mean by “a sync call to CPU” here?
All MXNet operations run in async mode (see system architecture), meaning as soon as you fire an operation (both on CPU or GPU) MXNet returns control back to your code and process the operation in another C++ thread. For explicitly requesting the results, in MXNet you use one of the so-called “sync methods”, which basically hold the execution of the program until the result is available. The most common methods are
Thus, I meant that if you wanted to get the advantage of GPU speed here, it won’t be as fast as you would expect, because there is a sync call (
asscalar()) used in the code: at that point data on GPU should be synced with CPU. If you run this code on CPU you won’t notice any difference, but running it on GPU will be not as efficient as it could be if there were a native implementation.