I am using MXNet C++ API to train a neural network. And when I compile MXNet (1.2.0 from a git clone) with different compute capability and architecture, I was expecting a performance boost using higher compute capability (Cuda 9.2). However, I did not see such things.
What could explain that the speed of my neural network computation (a FCN) does not evolve from a compute capability of 3.0 to 7.2? The FCN is using only float32 computations.
I tried both with or without CuDNN. (CuDNN in “NaiveEngine” mode.)