Is there an example of a data iterator which does a lot of multithreaded preprocessing and builds a queue for each GPU?
If you’re using Gluon API you can set
num_workers on the
DataLoader to use multi-processing with any type of
Dataset. You typically want to set this to the number of CPUs available for optimal performance which you can find with
multiprocessing.cpu_count(). All data loading and preprocessing (e.g. data augmentation) will be performed in parallel across different processes, and automatically added to a queue to be sent to the GPUs. Check out the tutorial here for example of this.
With Module API, you can perform multi-threading (different from multi-processing) for the data loading and augmentation using the
preprocess_threads argument of
Yeah, that leads to my real question - how do you guarantee each batch is only sent once? What if two threads/processes call next at the same time? There’s no lock on the actual index update. My current solution is:
with self.rlock: self.index += 1
multiprocessing.value works as well
I’m not exactly sure what code you’re looking at. In the DataLoader in Gluon, the main process creates a batch of indices that is then passed to each worker process. A worker process fetches a batch of indices and constructs the batch of adata by reading the data at the indices in the batch of indices. This is the code if you’re interested: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/dataloader.py#L215