The code is as follows.
The reunion part is helpful for increasing the training speed.
for epoch in range(epochs): train_loss = 0. for features, indices in tqdm.tqdm(train_data): a_index, p_index, n_index = zip(*indices) data = features.as_in_context(ctx) with autograd.record(): pred = net(data) # reunion the prediction as anchor, positive, negative a = pred[a_index,:] p = pred[p_index,:] n = pred[n_index,:] loss_ = loss(a,p,n).sum() loss_.backward() trainer.step(batch_size, ignore_stale_grad=True) train_loss += loss_.asscalar()
Is the reunion reasonable? And will the loss_ update the parameters correctly？