# Mxnet vs numpy incredible slow

Code:

``````import numpy as np
from mxnet import nd

def gen(size, step, count):
r = np.empty(shape=(count, size), dtype='int32')
for i in range(count):
r[i] = np.arange(i, i + size * step, step)
return r

def gen_mx(size, step, count):
r = nd.empty(shape=(count, size), dtype='int32')
for i in range(count):
r[i] = nd.arange(i, i + size * step, step)
return r
``````
``````%timeit gen(1000, 2, 100000)
473 ms ± 2.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
``````
``````%timeit gen_mx(1000, 2, 100000)
41.8 s ± 2.37 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
``````

What am I doing wrong? Why is it so slow compared with numpy?

Hi Melvin,

I’m not very familiar with the details of `NDArray`, but vectorization will be faster than plain `for` loop.

Here is an example:

``````def gen_mx2(col_size, step, row_size):
A = nd.arange(row_size, dtype='int32').reshape(row_size, 1).broadcast_axes(axis=1, size=col_size)
B = nd.broadcast_axes(nd.arange(col_size, dtype='int32').reshape(1, col_size) * step, axis=0, size=row_size)
return r
``````

To avoid ambiguity, I changed the argument names, but you can use some small numbers to check they are equivalent.

And on my laptop (cpu), the results are:

``````%timeit gen(1000, 2, 100000)
538 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit gen_mx2(1000, 2, 100000)
2.06 s ± 14.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
``````

We can see that `mxnet.nd` is still slow than `numpy`, but not that much as in your previous test.

Since there are no real tensor calculation in this demo benchmark, only array initialization, I guess that mxnet is slow for some extra memory copy? Maybe. We can do some other test, like matrix multiplication.