Sparse _backward_dot operator is slow and it seems to depend of the dimensions and not number of non zero entries.
The number of non-zero entries is kept constant and the feature dimension is varied.
with num_features = 1,000,000 the results are -
with num_features = 100,000,000 the results are -
As can be seen from the results, there is no appreciable difference in the time for dot, but time for backward dot and adam update increases.
The benchmarking is done on CPU on a mac notebook.
Is there a scope of improvement for backward_dot operator? I tried to follow the code in https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/dot-inl.h, but could not follow well enough why backward dot is dependent upon the dimension size.