Training YoloV3 with input dimensions 608x608 returns NaN loss

Greetings, everyone.

I am trying to train a YoloV3 on a custom dataset by referencing [1] and modifying [2]. In the model zoo, I saw that YoloV3 has 3 input dimensions: 310,416 and 608.

I tried training with input dimension 608 first, but after several epochs all losses were starting to report Nan, hence i switched to 416 and the losses were no longer reporting NaN. To note, I am using SGD with multi-precision: true

I am still looking to use 608 as my input size and was wondering if anyone would be able to offer guidance on the cause of the NaN issue


I seemed to have found the root cause of my issue: there was a bug present in MXnet version 1.5 that can only be resolved by installing the master/nightly package for MXnet

For the convenience of everyone that might run into this issue, refer to:

Thanks for sharing your solution @Lee,

as a reminder for other, to install the nightly version of mxnet:

pip install mxnet-cu92mkl --pre for example