Model Selection, Underfitting and Overfitting

http://d2l.ai/chapter_multilayer-perceptrons/underfit-overfit.html

Is anybody having numeric precision (or something that I do not really understand) trying to overfit the linear regression?
If I try to overfit even creating an square feature matrix (so train loss should be aproximately zero) the train_loss gets bigger and bigger.
I also have realised that even trying the analytic solution (\mathbf X^T \mathbf X)^{-1}\mathbf X^T \mathbf y using the pseudoinverse of numpy converting to numpy using asnumpy() from mxnet.numpy the precission of the inverse is several orders of magnitude less that just using numpy.
So neither the analytic way nor using gradient descent, it seems that mxnet have problems with very small numbers (when for example maxdegree is 15 or 20 there are columns with numbers on the order of 1e-15). I know that maybe the solution with SGD the problem can be the learning rate that should be much more smaller but the analytic one I think that can be a problem.
Nobody has experienced this problem?