NDArray.concat failed to concatenate two array on different GPUs?


#1

The following codes raised error:

import mxnet as mx
a = mx.ndarray.arry([[1,2,3],[4,5,6]],ctx=mx.gpu(0))
b = mx.ndarray.array([[1,2,3],[4,5,6]],ctx=mx.gpu(1))
mx.ndarray.concat(a,b,dim=1)

raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [05:42:09] /mxnet-1.2.0/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

The goal is to concat multiple ndarray and convert into numpy array. Those ndarray objects were generated by prediction using model trained with multiple GPUs and model parallelism. Seems that converting each ndarray into numpy array using asnumpy() then called numpy.concatenate() was not efficient. So just checked whether it’s ok to concatenate using mxnet.ndarray.concat first and then convert merged array into numpy array. Any suggestion?


#2

Actually, you can first check if they are in same context by using as_in_context()
The reason why using asnumpy() works is because it automatically moves things to CPU and then computes.

For the context issue, you can simply use copyto(), if i dont get you wrong.


#3

Unfortunately, you have to have arrays in the same context before you can do an operation on them. The simplest way to do that is to call as_in_context(ctx) method and provide same context you want it to be. This context doesn’t need to be a CPU. Here is the example:

import mxnet as mx
a = mx.ndarray.array([[1,2,3],[4,5,6]],ctx=mx.gpu(0))
b = mx.ndarray.array([[1,2,3],[4,5,6]],ctx=mx.gpu(1))
b_copy = b.as_in_context(a.context)
mx.ndarray.concat(a,b_copy,dim=1)

Which produces:

[[ 1.  2.  3.  1.  2.  3.]
[ 4.  5.  6.  4.  5.  6.]]
<NDArray 2x6 @gpu(0)>

#4

The answers look clear and straightforward. Thanks you guys.