Hi,
I am trying to profile the memory usage of the MXNet RNN language modeling benchmark (located in example/rnn/word_lm
). I ran the example on a CUDA 9, cuDNN 7, Ubuntu 16.04 machine and the example is using a total amount of 700
MB memory on the GPU side. I tried to pinpoint what those memories are allocated for and therefore added the following traces after the cudaMalloc
calls in file src/storage/pooled_storage_manager.h
and src/storage/gpu_device_storage.h
.
// Trace the cudaMalloc function call.
static double accumulated_alloc_mem = 0.0;
std::cout << "cudaMalloc from file " << __FILE__ << " is called.";
std::cout << "\t" "Allocated Memory (MB): " << size * 1.0 / 1e6 << std::endl;
std::cout << "\t" "Allocated Memory (Accumulated, MB): " <<
(accumulated_alloc_mem += size * 1.0 / 1e6) << std::endl;
std::cout << "\t" << dmlc::StackTrace() << std::endl;
However, it seems that I am only able to trace 200
MB memory out of the total allocated 700
MB. Are there any other places where GPU memory is allocated? Could someone please give me some hints on this? Thanks.