I am using MXNet to train resnet50 100 epochs using ImageNet 2012 dataset. When I used 8 nodes where each node has 4 V100 GPUs, and I used the default learning rate 0.1, then the training has no progress. The top-1 train accuracy is always ~0.1% and top-5 train accuracy is always ~0.5%. I also tried larger learning rate 0.4 but still has the same issue. The --kv-store = dist_device_sync.
Then I used 4 V100 GPUs within a node, and I still use he default learning rate 0.1. As a result, I got 89.45% top-1 train accuracy and 97.39% top-5 train accuracy. The --kv-store=device.
So how to choose the learning rate when using multi-node? Does anyone have the same issue and know the solution? Thanks.