Low GPU usage training cifar10

WEN · June 4, 2018, 11:15am

environment : python 3.6.4 / mxnet-cu91 / GTX1070

hi everyone , i’m trying to run the code written in chapter “deep CNN” in tutorials(https://gluon.mxnet.io/chapter04_convolutional-neural-networks/deep-cnns-alexnet.html)

It’s a alexnet model with cifar10 dataset , i’ve tried different batch size (64/128/512) ,but gpu usage always be at about 20% and running at about 30% TDP (gpu memory usage is pretty high at 5886MB of 8 gigs when batch size equals to 512) , it’s depressing.

am i doing something wrong? could anyone help,thanks

WEN · June 4, 2018, 11:43am

by the way i’m running on windows ,is that the problem?

ThomasDelteil · June 4, 2018, 6:51pm

The bottleneck could be dataloading, can you try using this instead:

import multiprocessing
train_data = gluon.data.DataLoader(
    gluon.data.vision.CIFAR10('./data', train=True, transform=transformer),
    batch_size=batch_size, shuffle=True, last_batch='discard', num_workers=multiprocessing.cpu_count())

test_data = gluon.data.DataLoader(
    gluon.data.vision.CIFAR10('./data', train=False, transform=transformer),
    batch_size=batch_size, shuffle=False, last_batch='discard', num_workers=multiprocessing.cpu_count())

Also on every batch your are calling .asscalar() which forces a synchronous copy to CPU. Moving this at the beginning of your epoch loop, rather than at the end should help because the data will be loaded first on GPU:

        if i > 0:
            curr_loss = nd.mean(loss).asscalar()
            moving_loss = (curr_loss if ((i == 0) and (e == 0))
                           else (1 - smoothing_constant) * moving_loss + (smoothing_constant) * curr_loss)

edit: not sure multiprocessing will work on windows as it does not support forking

WEN · June 24, 2018, 5:52pm

@ThomasDelteil

Thanks a lot , couldn’t run your multiprocessing code ,but performance significantly improved after i put .asscalar() in the front of each loop as you said!

Topic		Replies	Views
Gluon CNN training with GPU inactive while ctx = mx.gpu(0) Gluon	2	1902	September 27, 2018
Surprisingly low training performance on volta V100 Performance	5	1407	June 22, 2018
GPU memory usage	18	4618	November 23, 2017
Very slow initialisation of GPU distributed training Gluon	7	1295	September 7, 2020
Documentation Request: Model Parallelism Tutorial Performance	6	1843	March 10, 2018

Low GPU usage training cifar10

Related Topics