I wrote a program which contains an algorithm called distributed randomized gradient descent (DRGD). There are some internal variables in the algorithm which are used to calculate the step lengths. The training algorithms should be much complex than DRGD, so there should be more internal variables. If we preserve these variables, we can pause training and test the model; then, we will resume the training again.
If you want to store some data across multiple devices (GPUs or machines) you can use KVStore. Here is the tutorial on how to use it.
Please note, that KVStore is considered to be quite an advanced feature, and should be used with care.
I am not sure, but it could be that what you call a “Trainer” in MXNet world may actually be called an “Optimizer”. So, please consider reading this API page as well.
This is a wrong question. My bad, sorry.