Triplet (loss) generator for face recognition


#1

Hello everyone,

I am trying to implement a face recognition model which uses triplet loss (like facenet). The loss function is not the actual problem, but I’m struggling to develop a proper triplet image generator, which feeds a batch into the neural network. The batch should contain multiple image triplets of respective positve, negative and one anchor identity (image).
I saw, that there is already a Gluon API functionality for the Triplet Loss (https://mxnet.incubator.apache.org/api/python/gluon/loss.html#mxnet.gluon.loss.TripletLoss) , but I did not find a way how to work (feed it) with images. So far, I got the following:

def test_triplet_loss():
np.random.seed(1234)
N = 20
data = mx.random.uniform(-1, 1, shape=(N, 10))
pos = mx.random.uniform(-1, 1, shape=(N, 10))
neg = mx.random.uniform(-1, 1, shape=(N, 10))
data_iter = mx.io.NDArrayIter(data, {‘pos’: pos, ‘neg’: neg}, batch_size=10,
label_name=‘label’, shuffle=True)
output = get_net(10)
pos = mx.symbol.Variable(‘pos’)
neg = mx.symbol.Variable(‘neg’)
Loss = gluon.loss.TripletLoss()
loss = Loss(output, pos, neg)
loss = mx.sym.make_loss(loss)
mod = mx.mod.Module(loss, data_names=(‘data’,), label_names=(‘pos’,‘neg’))
mod.fit(data_iter, num_epoch=200, optimizer_params={‘learning_rate’: 0.01},
initializer=mx.init.Xavier(magnitude=2), eval_metric=mx.metric.Loss(),
optimizer=‘adam’)

But this uses the mx.io.NDArrayIter and no actual images. So how can I feed triplet image batches to the network? There must be a way to select a positve, a negative and an anchor image for a proper triplet and concatenate multiple triplets within batches. I thought, that if we already have a suggestion for the triplet loss API, there might also be way to train a network based on this loss. :slight_smile:

Did anyone ever tried to implement an image triplet generator in order to train a neural network within Mxnet? Or do you have an idea, how this could be done?

Thanks in advance and kind regards,

kepoggem


#2

Hi,

If I understand correctly the problem is how you forward the network with multiple inputs? Or the data generator? If it is the network (am using custom data generators) then the following should help:

The sketch of the code (in Gluon) goes like this (3 input images, 3 outputs, can be actually any number of in/out).



from mxnet.gluon import HybridBlock

def NetworkX3(HybridBlock):
    def __init__(self, _some_arguments, **kwards):
        HybridBlock.__init__(self,**kwards)

        self.arguments = _some_arguments

        # here you construct your network 
        with self.name_scope():
            self.Layer1 = # some layer
            # More layers, e.g. you can build this upon sequential

            self.Layer3 = gluon.nn.Conv2D(...) # Or some other network branch etc



    def hybrid_forward(self,F, _input1, _input2, _input3):
   

       x1 = self.Layer1 (_input1) # 
       x2 = self.Layer2 (_input2) # 
       x3  = self.Layer3 (_input3) # and goes on 

      # Here you can do any combinations you want to perform. 

      return x1, x2, x3. 

so now once you define and initialize your network, you call it like

import mxnet as mx
from mxnet import autograd
# more imports

mynet = NetWorkX3(some_arguments)
mynet.initialize(mx.initializer.Xavier())

img1, img2, img3 = some_data_generator()




with autograd.record():
    x1, x2, x3 = mynet(img1, img2, img3) 
    loss1 = Someloss(x1) 
    loss2 = SomeLoss(x2)
    loss3 = SomeLoss(x3)
    loss = loss1+loss2+loss3

loss.backward()
trainer.step(batch_size)

hope this helps.


#3

I also found some problems with this function,gluon.loss.TripletLoss(anchor,pos,neg). when I feed the anchor, positive, negative correspondently. Their shapes are n by 128, but I got the shape of loss which is 2 by n by 128. I got confused .