I am trying to profile the memory usage of the MXNet RNN language modeling benchmark (located in
example/rnn/word_lm). I ran the example on a CUDA 9, cuDNN 7, Ubuntu 16.04 machine and the example is using a total amount of
700 MB memory on the GPU side. I tried to pinpoint what those memories are allocated for and therefore added the following traces after the
cudaMalloc calls in file
// Trace the cudaMalloc function call. static double accumulated_alloc_mem = 0.0; std::cout << "cudaMalloc from file " << __FILE__ << " is called."; std::cout << "\t" "Allocated Memory (MB): " << size * 1.0 / 1e6 << std::endl; std::cout << "\t" "Allocated Memory (Accumulated, MB): " << (accumulated_alloc_mem += size * 1.0 / 1e6) << std::endl; std::cout << "\t" << dmlc::StackTrace() << std::endl;
However, it seems that I am only able to trace
200 MB memory out of the total allocated
700 MB. Are there any other places where GPU memory is allocated? Could someone please give me some hints on this? Thanks.