Need to cross-validate num_workers?

olivcruche · December 4, 2018, 11:36am

Hi, I’m training resnet50v2 on FashionMNIST for 5 epochs with batch size/GPU 1024. I’m seeing the following behavior with num_workers from DataLoader:

with num_workers=8 avg epoch is 4.2s
with num_workers=16 avg epoch is 6.2s
Is there a rationale to pick a right num_workers? should it be tuned in HP search?

sad · December 4, 2018, 7:36pm

Num_workers should be picked so that it’s equal to number of cores on your machine for max parallelization. If you increase it further you start to incur costs due to the overhead of context switching by the OS.

olivcruche · December 4, 2018, 8:03pm

thanks @sad It’s what I thought - I used to actually use num_workers = multiprocessing.cpu_count() to make this scale up or down with machine hardware but in the numbers above I’m in p3.8xl with 32vcpu, so I’m surprised that using 16 ends up in slower training than with 8.

feevos · December 5, 2018, 2:29pm

Hi, the number of cores used for optimisation is something you need to finetune based on the particular application in hand. More cores is not always faster, it really depends on the load each worker/cpu has to do. It all comes down in a trade-off between communication cost vs computation cost. I recall from the past that (e.g.) intel TBB was using some kind of internal algorithm to automatically decide the optimum number of cores needed for best performance on a specific job. Similar line of reasoning I’ve followed in the past parallelizing for loops with OpenMP.

Hope this helps.

Topic		Replies	Views
How to make full use of cpu to speed up when training with gluon？ Gluon	2	933	December 24, 2018
GPU utils is low when training yolov3 network by gluoncv Gluon	1	395	December 5, 2019
Dataloader with num_workers > 0 crashes Performance	10	2888	October 22, 2019
Dataloader cost too much gpu-0 memory Gluon	1	562	August 31, 2018
Example of a multithreaded data iterator	6	2089	May 22, 2019

Need to cross-validate num_workers?

Related Topics