So KFAC and KFRA are quite more sophisticated optimization methods for neural networks. They use a Kronecker-Factored approximation for each block of the Gauss-Newton. However, they require essentially a very specialized extra backward pass. I was wondering if anyone could give advice on what would be the most appropriate way of actually doing this on the
Block levels. Additionally, they require matrix inverses I was wondering if the corresponding functions (e.g.
trsm) have also been linked to work on the GPU via cublas/cusolve.