The following codes raised error:

```
import mxnet as mx
a = mx.ndarray.arry([[1,2,3],[4,5,6]],ctx=mx.gpu(0))
b = mx.ndarray.array([[1,2,3],[4,5,6]],ctx=mx.gpu(1))
mx.ndarray.concat(a,b,dim=1)
```

raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: [05:42:09] /mxnet-1.2.0/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

The goal is to concat multiple ndarray and convert into numpy array. Those ndarray objects were generated by prediction using model trained with multiple GPUs and model parallelism. Seems that converting each ndarray into numpy array using asnumpy() then called numpy.concatenate() was not efficient. So just checked whether it’s ok to concatenate using mxnet.ndarray.concat first and then convert merged array into numpy array. Any suggestion?