Adagrad

https://en.diveintodeeplearning.org/chapter_optimization/adagrad.html

can we not update S_t, instead we just use S_t = <g,g> at each time step? Then the learning rate doesn’t decay.

Then it equals to every time we update the weight by sign(g_t)