It’s a alexnet model with cifar10 dataset , i’ve tried different batch size (64/128/512) ,but gpu usage always be at about 20% and running at about 30% TDP (gpu memory usage is pretty high at 5886MB of 8 gigs when batch size equals to 512) , it’s depressing.
am i doing something wrong? could anyone help,thanks
Also on every batch your are calling .asscalar() which forces a synchronous copy to CPU. Moving this at the beginning of your epoch loop, rather than at the end should help because the data will be loaded first on GPU:
if i > 0:
curr_loss = nd.mean(loss).asscalar()
moving_loss = (curr_loss if ((i == 0) and (e == 0))
else (1 - smoothing_constant) * moving_loss + (smoothing_constant) * curr_loss)
edit: not sure multiprocessing will work on windows as it does not support forking
Thanks a lot , couldn’t run your multiprocessing code ,but performance significantly improved after i put .asscalar() in the front of each loop as you said!