This is a comparison of the MxNet Xavier Initialization and the PyTorch one.
This was first published to the PyTorch forum
(excuse the formatting of the link, new users can only post 2 links in a question…)
but as it involves both frameworks, it would need expertise from both sides.
I am porting an MxNet paper implementation to PyTorch
mx.init.Xavier(rnd_type="uniform", factor_type="avg", magnitude=0.0003)
Should be pretty much the same, right?
But the docs and source code show another “definition” of magnitude and gain
Even when scaling gain and magnitude correctly, I am still getting different ranges of numbers.
Both starting from an empty array and initializing it.
Am I missing something?
How can I make sure that both PyTorch and MxNet functions are initializing a specific input array in the same way?