Bert Transfer Learning

Looking for an mxnet implementation of a BERT based transfer learning sample (preferably on multi-gpu), where the end layer is customized for a specific use case. I am interested in using the dataset I have, that contains 10 different classes based on topic/ theme.

Do you mean fine-tuning?

Yes, fine-tuning it for a custom application. Unfortunately, did not find a good sample for mxnet framework.

Have you checked the gluon-nlp site? The SQuad example may be a example for you. But I don’t think it uses multiple GPU.

@w_a_r_b_e, you can find:


Super helpful. Thank you!