Adam method

#1

https://en.diveintodeeplearning.org/chapter_optimization/adam.html

#2

The bias corrections always set v_t = (1)*g_t, what’s the point of this?

#3

I don’t see why v_t is always g_t, can you explain?