How to train models with multiple gpus in C++


mx.mod.Module provides a convenient high level api for model training in python. But due to some reasons, I need to train my models in pure C++ environment. I am wondering if it is also possible to support multiple gpu device with the interfaces in cpp-package/include/mxnet-cpp.
Currently, I can only find the Executor in cpp-package/include/mxnet-cpp/executor.h (which supports single gpu training).


I am not a C++ binding expert, but looking through the API I don’t see either an obvious way of doing that out of the box. For example if you wanted to perform data parallelism (training multiple copy of the same model in parallel on each GPU, effectively allowing you to increase your overall batch size), you could proceed in the following way:

  • Initializing your model on each GPU
  • Splitting and copying your training data evenly on each GPU
  • Passing the data batches forward
  • Computing the gradients.
  • Aggregating your gradients and updating your model weights on each GPU

Which is effectively what the module API is doing