When to set CUDNN_AUTOTUNE_DEFAULT to 0?

anirudh2290 · October 15, 2018, 7:21pm

We know that CUDNN_AUTOTUNE_DEFAULT is by default set to 1. As documented here: https://mxnet.incubator.apache.org/faq/env_var.html, when this is set to 1 MXNet chooses the best/fastest algo to run for Convolution/DeConvolution operators by running performance tests. Once an algo is chosen it is cached according to specific input shape, output shape or weight shape among other factors (compute dtypes, compute capability etc.). This prevents rerun of the performance tests when the same input shape, output shape and weight shape is used. But the algo selection would be triggered again for a combination of input_shape, weight shape and output_shape that wasn’t seen before.

For a use case where the input shape , output shape and weight shape are very varied for different forward calls, the CUDNN Algo selection and performance tests run too often and become a performance bottleneck. In this case, we found the performance (latency per forward call) was much better with CUDNN_AUTOTUNE_DEFAULT set to 0.

Starting this thread to capture other use cases where it was better to set CUDNN_AUTOTUNE_DEFAULT to 0.

ThomasDelteil · October 23, 2018, 8:21pm

Thanks for reporting this @anirudh2290, I have had the same experience as well, running the benchmark can slow down the overall execution time.

Topic		Replies	Views
What triggers performance tests for best convolution?	1	366	January 22, 2019
How to set cudnn_tune argument Discussion	6	1364	October 26, 2018
Is it normal that mxnet takes up much more GPU memory at the start up? Discussion	3	2909	May 30, 2018
Saving and loading cudNN autotune and graph optimization Discussion	2	873	February 6, 2020
The GPU memory usage is not stable Performance	3	1014	May 12, 2018

When to set CUDNN_AUTOTUNE_DEFAULT to 0?

Related Topics