I am training a 3-layer BiLSTM for sequence labeling for text.
Tried it on a different versions of Mxnet(1.1/1.2/1.3) an cuda (8/9).
On P3 instances, the training pipeline freezes non-deterministically with 100% volatile GPU utilization.
The same pipeline runs fine on P2 instances (python packages, cuda and mxnet versions being the same)
I am training a character level sequence prediction task where I limit the sequence length to 200 characters and batch size to 32. I also tried with smaller sequence lengths and batch sizes. The characters are just mapped to integers. The pipeline is akin to character level models for predicting punctuations.
Unfortunately I cannot share the exact data and dataloader I am using as it is proprietary.
Are there any similar public implementations I can try to run? Or any other diagnostics I can provide which might help?