MXNet Forum

Gluon Dataloader hangs


#1

I am having issues with the gluon dataloader that most of the time it hangs at the start of iteration for a long time on an amazon ec2 DL AMI (ubuntu).

I assume it needs to copy the whole dataset class to each process which makes it slow. Now my dataset contains a list of 200k strings (image paths). Is there a way to speed this up so I dont have to wait like 10-20 minutes at start?


#2

If you’re using a large num_worker, the issue is that all threads start creating batches together and that can create I/O contention. How long does the startup take if you use num_workers=2?


#3

A bit less, but still considerable.


#4

Is your dataset part of the AMI image? Just wondering if this is related to AMI cold start (i.e. AMI image is stored on S3 and block-synched on a read-miss).


#5

It is just a txt file with a bunch of rows in it. Each row corresponds to one entry in my dataset.