You need to write your own
CustomDataset, which needs only to provide an elementary load of an item by index using
__getitem__() method. Instead of loading all items into the memory, it could contain a mapping of item index to item path on disk and load requested item on demand only. See an example how ImageFolderDataSet is doing this - it collects image paths in
_list_images method and load and image only when it is actually needed.
Once you have your
CustomDataset, you use it with the default DataLoader, and set num_workers attribute to a value greater than 0 - it will spin up that number of multiprocessing workers, which will use your CustomDataset. This is how you achieve multiprocessing in loading your data from your CustomDataset without actually doing multiprocessing yourself.
You can learn more how Dataset/DataLoader combination works in MXNet from this tutorial.