Need to cross-validate num_workers?


#1

Hi, I’m training resnet50v2 on FashionMNIST for 5 epochs with batch size/GPU 1024. I’m seeing the following behavior with num_workers from DataLoader:

  1. with num_workers=8 avg epoch is 4.2s
  2. with num_workers=16 avg epoch is 6.2s
    Is there a rationale to pick a right num_workers? should it be tuned in HP search?

#2

Num_workers should be picked so that it’s equal to number of cores on your machine for max parallelization. If you increase it further you start to incur costs due to the overhead of context switching by the OS.


#3

thanks @sad It’s what I thought - I used to actually use num_workers = multiprocessing.cpu_count() to make this scale up or down with machine hardware but in the numbers above I’m in p3.8xl with 32vcpu, so I’m surprised that using 16 ends up in slower training than with 8.


#4

Hi, the number of cores used for optimisation is something you need to finetune based on the particular application in hand. More cores is not always faster, it really depends on the load each worker/cpu has to do. It all comes down in a trade-off between communication cost vs computation cost. I recall from the past that (e.g.) intel TBB was using some kind of internal algorithm to automatically decide the optimum number of cores needed for best performance on a specific job. Similar line of reasoning I’ve followed in the past parallelizing for loops with OpenMP.

Hope this helps.