Thread-safety in C++ API

cpp
performance

#1

Hello,

I’m using C++ API and tried to add some parallelization using openmp threads in the learning loop. In particular I tried to run few batches (forward/backward ) in parallel and then manually collect and update gradients from each run.
I copy all the arguments for each thread and create separate Executor calling net.SimpleBind in each thread.
Time to time I’m getting bunch of NANs in one thread or another while single threaded version always works fine, so my question is:
Are the following calls thread-safe (assuming Executor and SimpleBind arguments are local per thread)?

  • Symbol::SimpleBind
  • Executor::Forward
  • Executor::Backward

For example does the following schematic code suppose to work?

Executor *exec_per_thread[num_threads];
map<string, NDArray> args_per_thread[num_threads];
#pragma omp parallel num_threads(num_threads)
{
    int thread_num = omp_get_thread_num();
    // calculate "thread batch" from_index and to index 
    // locally prepare args_per_thread[thread_num]
    args_per_thread[thread_num]["data"]  = X_train.Slice(from_index,to_index).Copy(ctx);
    // other arguments
    exec_per_thread[thread_num] = net.SimpleBind( ctx, args_per_thread[thread_num];
    exec_per_thread[thread_num]->Forward(true);
    exec_per_thread[thread_num]->Backward()
}

Also, for this test I’m using MXNET_ENGINE_TYPE=NaiveEngine to make sure that mxnet internal multi-threading doesn’t correlate with mine.

Thanks,
Eugene


#2

Hi,

I don’t believe the Executor::Forward call is thread safe. Specifically the Executor objects are not thread local AFAIK. The behavior for the code snippet you have would be unpredictable although I can’t foresee why you should be getting NaNs


#3

Thank you!
NaNs are quite strange for me too. I suspect that if Executor object is not thread safe there might be some shared temporary buffers between multiple instances which may cause some effects during Forward/Backward calculations.