I haven’t been able to find thorough documentation on exactly how the performance tests are run (controlled by MXNET_CUDNN_AUTOTUNE_DEFAULT=1).
If I am passing 3 different shapes through a network, will they be run three times? Will it only run once per input shape if running inference on thousands of inputs?