I am noticing a significant slowdown when running several experiments in parallel. I have found that the ImageIter slows down very much in reading batches when there are multiple parallel experiments running. Is there a known cause for this? If so, how can I speed this up? The time to read a batch goes from ~1s when 1 job is running to ~4s when 2 jobs are running. This is a reference to the iterator I am trying to use.
I am unable to find where the bottleneck is. I tried cloning the python environment, copying the recordio being input to the iterator so every experiment is reading a separate file but none of the things helped.
Another thing I noticed is that in the source code of ImageIter it takes the number of threads from the environment variable
MXNET_CPU_WORKER_NTHREADS but does not create any threads. Is this intended?