Dataloader with num_workers > 0 crashes


#1

My dataloader for my image-based dataset with num_workers > 0 often crashes due to python mulitprocessing. In fact I get the following error:

IOError: [Errno 104] Connection reset by peer

With num_workers = 0 (default) I have no issues other than training is very slow. Is this issue related to opencv threading? I am using python 2.7


#2

What’s the version of MXNet you’re using?


#3

I am using 1.3 (master branch)


#4

Are you using mxnet.image library or OpenCV directly? If using OpenCV directly, does the problem go away if you only mxnet.image calls?


#5

I am using the following call:

data = mx.image.imread(image_path, flag=1)

#6

Will reduce the num_worker resolve the issue?


#7

Even with num_worker it hangs at receiving the data. My dataset is return a tuple of 4 NDArrays, maybe pickling is slow?


#8

NDArray pickling uses shared memory when num_workers > 0 so that pickling wouldn’t copy over the memory for performance. I have, however, heard of a few users claiming that using Numpy to transfer data between processes is faster than using NDArrays with shared memory pickling. I always assumed that they’re doing something wrong because Numpy doesn’t supports shared memory AFAIK, but maybe there is something I’m missing.


#9

related issues: