Code:

```
import numpy as np
from mxnet import nd
def gen(size, step, count):
r = np.empty(shape=(count, size), dtype='int32')
for i in range(count):
r[i] = np.arange(i, i + size * step, step)
return r
def gen_mx(size, step, count):
r = nd.empty(shape=(count, size), dtype='int32')
for i in range(count):
r[i] = nd.arange(i, i + size * step, step)
r.wait_to_read()
return r
```

```
%timeit gen(1000, 2, 100000)
473 ms ± 2.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

```
%timeit gen_mx(1000, 2, 100000)
41.8 s ± 2.37 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

What am I doing wrong? Why is it so slow compared with numpy?

CPU load (while running mxnet):