Improving GPU usage on public SageMaker mxnet example

olivcruche · November 4, 2018, 4:29pm

Hi, I’m training this public gluon example on a p2 notebook https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_applying_machine_learning/gluon_recommender_system
looking at nvidia-smi, both GPU and GPU memory are under-utilized (mem is at 340Mib/11kMib) and GPU oscillates between 23% and 25%.
Is this expected?

NRauschmayr · November 5, 2018, 4:49am

I had a look on the example and you are right that the GPU utilization is rather low. One way to increase it, is to increase the batch size and set num_worker=4 in gluon.data.DataLoader().
The main reason for the low GPU utilization is that the model in this example is very simple: It only consists of 2 embedding layers and 1 dense layer:

with self.name_scope():
            self.user_embeddings = gluon.nn.Embedding(max_users, num_emb)
            self.item_embeddings = gluon.nn.Embedding(max_items, num_emb)
            self.dropout = gluon.nn.Dropout(dropout_p)
            self.dense = gluon.nn.Dense(num_emb, activation='relu')

So feeding data fast enough into the GPU probably becomes the major bottleneck in this example.

Topic		Replies	Views
Gluon CNN training with GPU inactive while ctx = mx.gpu(0) Gluon	2	1902	September 27, 2018
Question about memory usage during using Multiple GPUs Gluon	2	1180	January 27, 2018
Gluoncv does not detect GPUs Gluon	4	1539	October 8, 2019
Low GPU usage training cifar10 Performance	3	2109	June 24, 2018
AWS SageMaker MXNet USE_CUDA=1 MXNet Model Server	1	1761	October 11, 2019

Improving GPU usage on public SageMaker mxnet example

Related Topics