The only way to customize the L2 regularization of an optimizer (e.g. if one wanted to amplify regularization of the embedding layer only) is by calling set_wd_mult on the optimizer. However set_wd_mult is not exposed through the Module API and wd_mult cannot be set through optimizer parameters passed into Module.fit() call. As a result, the only way to customize L2 regularization appears to require creating and initializing the optimizer outside of the module and passing it into fit() call. A simple modification to the optimizer’s constructor can allow wd_mult to be passed into the call. Any thoughts on this?
There really is no good way of achieving this. However Gluon API makes this much simpler, easily allowing multiple trainers to be created with different weight decay parameters.