Do the C++ APIs work at all


#1

Having tried to get a simple image classification network to train with the C++ apis to no avail, I decided to try the examples. I found out that the InceptionBN model doesn’t run at all, it crashes at InferArgsMap.
The GoogleNet example (which by the way is not really exactly modeled after the original googlenet) doesn’t learn anything and hovers around 0.5 accuracy after 20 epochs. And that’s MNIST!
The mlp model works fine. Didn’t try the rest of them.
My question is: Has anyone else failed to get the C++ examples to work? They seem to have many discrepancies, let alone they don’t all seem to work? Are there any known issues with the C++ apis or am I doing something wrong here?
Thank you for your help!

oh btw, all my experiments were done on a GTX1080 with cuda10 (don’t know if that’s relevant, but I’m trying to eliminate the possibility that it might be a problem with my own setup)


#2

I have since figured it out.

To anyone who might be asking the same question, the GoogleNet C++ example does not converge because:

  1. The weights have not been initialized, AND
  2. The convolution layers are not followed by batch normalizations

This results in predictions that add up to +inf and -inf, and gradients that explode.
Solution:

  • initialize the weights correctly (I used Xavier, average, scale 2)
  • add a batch normalization layer after each convolution. This also requires that you no longer call SimpleBind to get the executor because you would need to infer the aux_maps as well, since the BatchNorm operation has 2 aux symbols, namely the moving_mean and the moving_var. You could call InferExecutorArrays to infer all the shapes for you.

As for the InceptionBN model, it will fail at InferArgs. To fix it, first the input image size needs to be 299 (and not 224). Alternatively, you may reduce the last pooling kernel from 8 to 2. Also, the Convolution class constructor is wrong you need to not use it. Because it includes a mandatory Bias symbol, and takes a boolean no_bias parameter. I ended up instantiating it using Operator(“Convolution”). And lastly, you will also need to bind to an executor by providing the aux_map since the model also has many layers that have auxiliary states.

And that’s about it! Good luck!