Adagrad

#1

https://en.diveintodeeplearning.org/chapter_optimization/adagrad.html

#2

can we not update S_t, instead we just use S_t = <g,g> at each time step? Then the learning rate doesn’t decay.