What is exactly a non-blocking call?

I understand blocking call such as asnumpy() or asscalar() can slow down GPU computation. But I am a bit confused about what’s block versus non-blocking, even after reading https://mxnet.apache.org/versions/1.5.0/tutorials/gluon/gotchas_numpy_in_mxnet.html. My vague understanding is that anything involving copying from main memory to GPU is blocking.

If so, are any following computations blocking calls?

from mxnet import gpu, nd
# x is defined on GPU
x = nd.ones(shape=(10,), ctx=gpu(0))
# y and z are on CPU, I assume
y = 2.0
z = 5

result1 = x * 2.
result2 = x * y
result3 = x.reshape(shape=(5,-1))
result4 = x.reshape(shape=(z,-1))

Thank you!


The blocking and non-blocking reference in the document meant synchronous and asynchronous functions. It has nothing to do with memory copy as it can be blocking or non-blocking.

Hi, I bumped into a similar issue and maybe I can report to you what they told me here.
MXNet works in an asynchronous way, i.e. some processes take place in parallel and not sequentially, not waiting for each other to finish.

When I had this problem, I was trying to time the execution of a function, so I had something like:

start_t = time.time()                                  # initial time
class_IDs, scores, bounding_boxes = net(rgb_nd)        # function I wanted to time
stop_t = time.time()                                   # time after function execution
time = stop_t - start_t

and I got an incredibly low value for time, i.e. it seemed the execution of the function was blazing fast.

Truth is, the 3 NDArray objects “class_IDs”, “scores”, “bounding_boxes” were actually just pointers, that will redirect to the real data once it becomes available. The code execution goes on even if that data is not ready yet (that’s why this is a non blocking call), and I was just measuring the time it took MXNet to copy data to a queue and initiate the neural net to start processing.
A blocking call would be something that forces the code to wait for the actual data to be available…for example, you could print it, copy it to another array, or something like that.
You can even force it to wait by using:

class_IDs, scores, bounding_boxes = net(rgb_nd)        # function I wanted to time

At least, this is my understanding of how the whole asynchronous mechanism works…hope I didn’t say anything wrong and that it helped!

1 Like

That is a great example, Lews, to check (feel) if the function is blocking or non-blocking. It is not obvious in @hyu 's case, as it runs fast already even all functions are blocking. For real cases, yes, it will show the the difference. Thanks for sharing your experience here.

1 Like

Thanks. This is very helpful.

Going back to my original example, if

x = nd.ones(shape=(10,), ctx=gpu(0))
y = 2.0

Then the operation of
result = x*y
involves the following operations:
op1. copy y from main memory to GPU
op2. broadcast if needed
op3. multiplication

and all the operations are non-blocking. Does this sound right?

Not exactly. In your last example, y is a scalar, and it stays in CPU context. There is no copy y to GPU and broadcast either. It is a scalar vector multiply in GPU.

Thanks. Can you explain why there is no copy y to GPU? I was always under the impression that operands need to stay in the same context.