In my program, I found the mxnet will take 900M GPU memory at the beginning when the program starts. And then reduce to 300M. Is that normal?
As you might see in the logs, at start-up MXNet is running auto-tuning for cudnn to find the most efficient convolution algorithm, for that it is using a certain amount of memory and when it’s done it reverts to the amount that your model actually needs.
src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
Thank you, After I run “export MXNET_CUDNN_AUTOTUNE_DEFAULT=0” , It works.
And as I work on the embedded platform, I want to keep the convolution work with the most efficient algorithm as well. Is there a way to run the auto-tuning once and record the result for later use or set a upper limit of the memory usage for the auto-tuning?