Improving GPU usage on public SageMaker mxnet example

Hi, I’m training this public gluon example on a p2 notebook https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_applying_machine_learning/gluon_recommender_system
looking at nvidia-smi, both GPU and GPU memory are under-utilized (mem is at 340Mib/11kMib) and GPU oscillates between 23% and 25%.
Is this expected?

I had a look on the example and you are right that the GPU utilization is rather low. One way to increase it, is to increase the batch size and set num_worker=4 in gluon.data.DataLoader().
The main reason for the low GPU utilization is that the model in this example is very simple: It only consists of 2 embedding layers and 1 dense layer:

with self.name_scope():
            self.user_embeddings = gluon.nn.Embedding(max_users, num_emb)
            self.item_embeddings = gluon.nn.Embedding(max_items, num_emb)
            self.dropout = gluon.nn.Dropout(dropout_p)
            self.dense = gluon.nn.Dense(num_emb, activation='relu')

So feeding data fast enough into the GPU probably becomes the major bottleneck in this example.

1 Like