Understanding MXNet GPU Memory Allocation



I am trying to profile the memory usage of the MXNet RNN language modeling benchmark (located in example/rnn/word_lm). I ran the example on a CUDA 9, cuDNN 7, Ubuntu 16.04 machine and the example is using a total amount of 700 MB memory on the GPU side. I tried to pinpoint what those memories are allocated for and therefore added the following traces after the cudaMalloc calls in file src/storage/pooled_storage_manager.h and src/storage/gpu_device_storage.h.

	// Trace the cudaMalloc function call.
	static double accumulated_alloc_mem = 0.0;

	std::cout << "cudaMalloc from file " << __FILE__ << " is called.";
	std::cout << "\t" "Allocated Memory (MB): " << size * 1.0 / 1e6 << std::endl;
	std::cout << "\t" "Allocated Memory (Accumulated, MB): " <<
		(accumulated_alloc_mem += size * 1.0 / 1e6) << std::endl;
	std::cout << "\t" << dmlc::StackTrace() << std::endl;

However, it seems that I am only able to trace 200 MB memory out of the total allocated 700 MB. Are there any other places where GPU memory is allocated? Could someone please give me some hints on this? Thanks.



You can also do memory profiling using the MXNet profiler. The profiler output can be viewed in chrome://tracing. Here is an example.

Since operator execution and memory utilization are shown in the same timeline, you can see what was executed when memory utilization increases.

Here is a tutorial on profiling.


Sorry but this is not the profiling result I am hoping to get. I think it is my fault to not be accurate at the very beginning but what I am really hoping to get is the memory breakdown of the allocated memory. Is there anyway to obtain such information based on existing tools?