Efficient way of saving gluon.Trainer?


What is the best way of saving gluon.Trainer? I need to update a pre-trained model by training it with a new dataset; however, if I recreate the Trainer, it would start with the initial learning rate, which can be confusing for optimizers such as AdaGrad that adjusts the learning rate with respect to the frequently occurring features. I could not find a method such as save_params for Trainer so please let me know if there is an easy way of saving it. Thanks!


From the documentation:


you can use save_states(fname) to save your trainer parameters, and then load_states(fname) to load it in the previous configuration.


trainer = gluon.Trainer(mynet.collect_params(),'adam',{'learning_rate':lr})

flname  = r'trainer_adam.states'

then restore


edit: The save command works, but when trying to restore trainer with load_states I get an error:

AttributeError                            Traceback (most recent call last)
<ipython-input-13-a93de5be24c4> in <module>()
      1 with autograd.record():
----> 2     trainer.load_states(r'../saved_models/resunet-trainer-epoch-41-stats.states')

/home/foivos/mxnet/gluon/trainer.pyc in load_states(self, fname)
    224             Path to input states file.
    225         """
--> 226         if self._update_on_kvstore:
    227             self._kvstore.load_optimizer_states(fname)
    228             self._optimizer = self._kvstore._updater.optimizer

AttributeError: 'Trainer' object has no attribute '_update_on_kvstore'

edit 2: Whithout being able to understand completely what is going on (am learning Gluon/mxnet these days), it seems you need to call at least once the step operation in order to “create” (?) the attribute '_update_on_kvstore'. When I perform at least one trainer.step(Nbatch) loading states works normally. You need to perform a trainer.step(Nbatch) operation also before saving states for the first time.

edit 3: Updated to the latest version of mxnet v1.1.0, now there is no problem in loading previously saved states directly (the error I described above does not appear). With delayed initialization you need to make a single forward pass before updating the parameters, so the optimizer knows the correct dimensions of the layers (I got an error loading trainer.load_states('some_flname.states') without running a single forward pass). I think it relates to this issue