How to train a model with huge classes


for example, I am training a face recognition model with millions ids, beside use tripletloss, I would like to use softmax-based losses such as arcloss, amsoftmax and so on. However, with such huge classes, gpu meomery will be insufficient, is there a way that I can train a model like this? Maybe split the softmax layer on multi gpus would be work, I wonder whether mxnet support this


Computing the softmax on millions of classes is very expensive. You could use a sampled softmax loss instead. This will only take into account a subset of classes in the loss. Here is a nice article about how to optimize softmax:


You can have a look at the sampled blocks in gluon-nlp package: