How to set cudnn_tune argument

I see there in the current CNN related APIs, we have a cudnn_tune argument. However, from the man page, it also says:

There are other options to tune the performance.
cudnn_tune : enable this option leads to higher startup time but may give faster speed. Options are
off : no tuning
limited_workspace :run test and pick the fastest algorithm that doesn’t exceed workspace limit.
fastest : pick the fastest algorithm and ignore workspace limit.
None (default): the behavior is determined by environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT . 0 for off, 1 for limited workspace (default), 2 for fastest.

Why do we need an argument and an environment variable? What if there values do not match. This usage is very confusing to users.

Env variable is the default, which can be altered by the conv specific parameter. This is a very common way of having a global setting which can be overwritten locally.

Why can’t we just set the default to limited_workspace in the API?

If this env variable is for local tuning, I don’t see why it should be exposed to users. What if I set fastest in the API and set env variable value to 1 (limited_workspace)?

The default, when env variable is not specified, is limited_workspace, because most users care more about speed. To be honest, I’m not sure where the confusion comes from. There is a global setting that allows easy global configuration and its default is the value that most uses would prefer. This global value can be overwritten at individual convolution operators to achieve very fine tuned behavior. The global behavior can also be changed, globally, without changing network definition, by changing the env variable. For example, I had a case where due to a CUDNN bug, I had to disable tuning and I achieved that by simply setting the env variable instead of modifying all instances of my convolution operators in model definition.

It is confusing because in Python API when some argument has default value you explicitly specify it during definition:

Convolution(…, cudnn_tune=limited_workspace)

In the current API design, the default is None, while you use another environment variable to set the default to limited_workspace. ???

I understand that you want to change this behavior if there is a bug in CUDNN. But this seems like a debugging utility for developers, not something a user should be aware of. Having two mechanisms to control the same thing is always confusing to end users.

After some more digging into the code, I found we do not expose this argument cudnn_tune in Gluon API. That means Gluon users can only control this option through the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT. If that’s the case, why don’t we just get rid of the cudnn_tune argument in the Python API altogether and only use the environment variable to control?

I think the pattern where one changes a behavior at global level and can customize it at local level is a widely accepted pattern. Case in point, Tensorflow allows using variable_scope to define parameter values at global level. However these values can be explicitly customized when a specific operator is defined. For example you can define default initializer for all layers to be xavier, but for some layers, you can use gaussian by explicitly specifying it. I feel the env variable is following similar pattern.

With respect to this specific tuning parameter, I agree that setting a different tuning parameter for an individual operator is quite rare. I would say that if, for example, I had a network that consisted of conv2d and conv1d and wanted to disable tuning on conv1d but keep it enabled on conv2d, I can technically implement my own custom gluon block for convolution and customize the tuning variable. One can argue that this is a too far fetched hypothetical application, and that’s a fair point and I wouldn’t disagree if community thinks it should be removed because it pollutes the API. I do not, however, agree with your point of view that having a global default and as well as an operator setting is confusing. One’s global default, one’s local overwrite.