Software recommendation for managing logs of deep learning experiments

Dear all,

does anyone have a recommendation on organizing the logs of the outputs of multiple experiments? What platform are you using? I am not interested in logging individual runs (metrics, loss functions etc - I have that already), but organizing a set of experiments. I find my self too many times overwhelmed by the number of experiments I am running and I am losing track at some point.

For example, each specific architecture I am testing has a set of configuration parameters and a set of output files associated with it. Usually, each experiment is ~4-5 output logs (after learning rate reductions). Put that in a distributed environment training, continuous source code updates (tracked with git) - architecture modifications, hyper parameters tweaking, and after XXX experiments it’s a bit of a chaos …

Currently am oriented towards creating my own small library with visualization functions and a central Jupyter notebook/lab to read logs/model parameters, maybe link it to a spread sheet with the unique ID of each run etc. But I may as well re-inventing the wheel here.

Any suggestions are most welcome.