Help to use nd.sgd.update for R optimizers


#1

Optimizers in R package are performing the lengthy update of the state which result in high memory consumption. As I understand it, garbage collection isn’t automatically performed on C objects so a manual gc() is needed in certain circumstances which impair performance.

Current attempt to implement the inplace update of the weights on the executor as in Python package is the following:

mx.nd.sgd.update(weight = weight, grad = grad, mom = state$mom, lr = lr, wd = wd, rescale_grad = rescale.grad, clip_gradient = clip_gradient, out = weight)

The weight and grad are the same function arguments as in current optimizers and come from the exec$ref.arg.arrays and exec$ref.grad.arrays. The function results in an appropriate update of the weight in the executor as well as of the states, but it returns an error message:

Error in mx.nd.sgd.mom.update(weight = exec$arg.arrays$fc_weight, grad = exec$grad.arrays$fc_weight, : ./ndarray.h:87: RCheck failed: ptr_->writable && !ptr_->moved Passing a read only NDArray to mutate function

The error can be bypassed by wraping the call within try function, but while training, R crashes randomly, after a varying number of iterations (anywhere between 20 to 150b iterations) for no clear cause (memory consumption remains low and doesn’t seem in cause).

During model training, the parameter update is performed with the following call:

for (i in seq_len(ndevice)) { updaters[[i]](train.execs[[i]]$ref.arg.arrays, train.execs[[i]]$ref.grad.arrays) }

Any help would be greatly appreciated to figure out where the glitch is with that approach as it seems to be nearly functional and would likely solve problematic memory consumption.