The code is as follows.
The reunion part is helpful for increasing the training speed.
for epoch in range(epochs):
train_loss = 0.
for features, indices in tqdm.tqdm(train_data):
a_index, p_index, n_index = zip(*indices)
data = features.as_in_context(ctx)
with autograd.record():
pred = net(data)
# reunion the prediction as anchor, positive, negative
a = pred[a_index,:]
p = pred[p_index,:]
n = pred[n_index,:]
loss_ = loss(a,p,n).sum()
loss_.backward()
trainer.step(batch_size, ignore_stale_grad=True)
train_loss += loss_.asscalar()
Is the reunion reasonable? And will the loss_ update the parameters correctly?