Dataloader with num_workers > 0 crashes


My dataloader for my image-based dataset with num_workers > 0 often crashes due to python mulitprocessing. In fact I get the following error:

IOError: [Errno 104] Connection reset by peer

With num_workers = 0 (default) I have no issues other than training is very slow. Is this issue related to opencv threading? I am using python 2.7


What’s the version of MXNet you’re using?


I am using 1.3 (master branch)


Are you using mxnet.image library or OpenCV directly? If using OpenCV directly, does the problem go away if you only mxnet.image calls?


I am using the following call:

data = mx.image.imread(image_path, flag=1)


Will reduce the num_worker resolve the issue?


Even with num_worker it hangs at receiving the data. My dataset is return a tuple of 4 NDArrays, maybe pickling is slow?


NDArray pickling uses shared memory when num_workers > 0 so that pickling wouldn’t copy over the memory for performance. I have, however, heard of a few users claiming that using Numpy to transfer data between processes is faster than using NDArrays with shared memory pickling. I always assumed that they’re doing something wrong because Numpy doesn’t supports shared memory AFAIK, but maybe there is something I’m missing.


related issues: