Is there a simpler way of implementing the update rule (eq. (7) in Meng et al. 2019) paper than by re-writing C code similar to the update rule for plain SGD as in
Update: problem solved. I managed running the algorithm by writing the riemannian gradient steps in python using gradients computed inside of autograd.record() and setting values of modified rows weights only manually inside the training loop. Note: the set_data method does not allow to use sparse gradients efficiently, I had to implement this myself