Slow speed when running multiple experiments with ImageIter

Hi,

I am noticing a significant slowdown when running several experiments in parallel. I have found that the ImageIter slows down very much in reading batches when there are multiple parallel experiments running. Is there a known cause for this? If so, how can I speed this up? The time to read a batch goes from ~1s when 1 job is running to ~4s when 2 jobs are running. This is a reference to the iterator I am trying to use.

I am unable to find where the bottleneck is. I tried cloning the python environment, copying the recordio being input to the iterator so every experiment is reading a separate file but none of the things helped.

Another thing I noticed is that in the source code of ImageIter it takes the number of threads from the environment variable MXNET_CPU_WORKER_NTHREADS but does not create any threads. Is this intended?

Figured out that the bottleneck was occuring during data augmentation. Changing OMP_NUM_THREADS helped.