Documentation Request: Model Parallelism Tutorial


Note: I have been to and read

which are the two tutorials about model parallelism.

They link to this git repo:

which has some files allowing run to train on cifar10 and mnist datasets using a lot of nifty arguments.

However, this code is written in a very esoteric way (from the perspective of someone new to MXNet).

So my question is or topic is basically a tutorial for using multiple gpus.
Something simple like initializing a simple network with gluon and then training it on multiple gpus.

Also explaining what is and how to use

module = mx.module.Module(context=[mx.gpu(0), mx.gpu(2)], ...)

would be nice as the current isolated documentation does not make it very clear (again, from the point of a newcomer, perhaps it makes more sense for those who are more experienced with MXNet).

I appreciate your assistance and clarification in advance.



There’re some well written tutorials for multi-GPU training here:
Maybe you want to also read the first few chapters to get the basics of the Gluon API.


I think having documentation reduce the friction for experienced user. Reading docs is usually faster than reading a tutorial.


It is always a trade off. Great documentation will be faster for those who are familiar with the library and api as reference. However, for those starting to learn the library (which is currently in transition from symbol to gluon), the current documentation is not yet accessible. So both are needed. Perhaps the best example of a well documented language is Mathematica which has extensive documentation with a variety of depths as well as examples.


Thank you for sending me that link. I did not see it prior. I have read the previous gluon tutorials, but even after reading and implementing those (as well as other tests that I have tried), I can not confidently use the documentation or be sure if I am implementing something in the “proper” way.


+1 vote for model parallelism tutorial. It is really important and currently not covered extensively. E.g. in medical segmentation tasks where one needs to tackle 3D convolution problems and memory bottleneck is a big problem.


There is a model parallel example using Module API… I do agree that a tutorial will be much better…