Densenet121 stale gradients error in 1.5.0 pip install

hskramer · August 17, 2019, 3:11am

I tried running densenet121 from the gluon model zoo for the first time since upgrading to 1.5.0 this is a standard pip3 install, I have not built anything on this computer. If I set ignore_stale_grad=True. It doesn’t work well and theirs a strange imbalance between gpu usage one is running at 80% and the the other at 30%. The data is a standard rec file made from caltech256. I have had this notebook since version 1.4.1 came out and have never had any problems. I should note that I do create a kvstore device and set kv=kvstore in the trainer.

Topic		Replies	Views
Fine-tuning error "gradient has not been updated by backward since last step" Gluon	1	1433	September 1, 2019
Lower accuracy on Cifar10 with multi-gpu implementation	5	599	August 23, 2018
SageMaker CPU Training: Gradient of Parameter `lstnet0_conv0_weight` on context cpu(1) has not been updated by backward since last `step` Gluon	4	861	April 2, 2019
Any tutorial for Gluon models?	10	1597	November 7, 2018
Gluoncv fcn inference failed Gluon	10	620	November 26, 2018

Densenet121 stale gradients error in 1.5.0 pip install

Related Topics