Hello,
I’m using C++ API and tried to add some parallelization using openmp threads in the learning loop. In particular I tried to run few batches (forward/backward ) in parallel and then manually collect and update gradients from each run.
I copy all the arguments for each thread and create separate Executor calling net.SimpleBind in each thread.
Time to time I’m getting bunch of NANs in one thread or another while single threaded version always works fine, so my question is:
Are the following calls thread-safe (assuming Executor and SimpleBind arguments are local per thread)?
- Symbol::SimpleBind
- Executor::Forward
- Executor::Backward
For example does the following schematic code suppose to work?
Executor *exec_per_thread[num_threads];
map<string, NDArray> args_per_thread[num_threads];
#pragma omp parallel num_threads(num_threads)
{
int thread_num = omp_get_thread_num();
// calculate "thread batch" from_index and to index
// locally prepare args_per_thread[thread_num]
args_per_thread[thread_num]["data"] = X_train.Slice(from_index,to_index).Copy(ctx);
// other arguments
exec_per_thread[thread_num] = net.SimpleBind( ctx, args_per_thread[thread_num];
exec_per_thread[thread_num]->Forward(true);
exec_per_thread[thread_num]->Backward()
}
Also, for this test I’m using MXNET_ENGINE_TYPE=NaiveEngine to make sure that mxnet internal multi-threading doesn’t correlate with mine.
Thanks,
Eugene