Best choice of kvstore parameter in fit methods

saurabh3949 · October 31, 2017, 9:30pm

Hi,
I would like to discuss the best option for kvstore parameter that is passed to the fit method of module API.

For single machine, single GPU:
If we don’t have GPU memory constraints, is it always faster to use device instead of local?

For single machine, multiple GPU:
Again if we don’t have GPU memory constraints, is device always better? The documentation says “When using a large number of GPUs, e.g. >=4, we suggest using device for better performance.”

For multiple machines, multiple GPU:
For synchronous updates, which is better? dist_sync or dist_device_sync?

Thanks!

Jerry · November 1, 2017, 6:55am

In my experience training ResNet, device is usually faster. However, Mu said they found Inception style network tends to be faster using local/dist_sync.

Topic		Replies	Views
Kvstore for distributed multi-gpu training Performance	10	2737	November 16, 2017
Multi system multi gpu distributed training slower than single system multi-gpu Performance	5	3425	December 22, 2021
Gluon sync mode in single node? Gluon	1	322	November 7, 2018
Single-node multi-gpu machine Gluon	3	1286	October 13, 2018
KV store tutorial Discussion	1	713	April 26, 2019

Best choice of kvstore parameter in fit methods

Related Topics