MXNet Forum

How to use the model of insightface by cpp package to extract features of face for recognition?


#1

As the title mentioned, I would like to use the model from deepinsight by cpp package.

Following are my solutions,

  1. Function to load check point

    void load_check_point(std::string const &model_params,
                       std::string const &model_symbol,
                       Symbol *symbol,
                       std::map<std::string, NDArray> *arg_params,
                       std::map<std::string, NDArray> *aux_params,
                       Context const &ctx)
    {
     Symbol new_symbol = Symbol::Load(model_symbol);    
     std::map<std::string, NDArray> args;
     std::map<std::string, NDArray> auxs;
     for(auto iter : NDArray::LoadToMap(model_params)){
         std::string type = iter.first.substr(0, 4);
         std::string name = iter.first.substr(4);
         if (type == "arg:"){
             args[name] = iter.second.Copy(ctx);
         }else if (type == "aux:"){
             auxs[name] = iter.second.Copy(ctx);
         }
     }
    
     *symbol = new_symbol;
     *arg_params = args;
     *aux_params = auxs;
    }
    
  2. Load check point

     Symbol net;
     std::map<std::string, NDArray> args, auxs;
     load_check_point(model_params, model_symbols, &net, &args, &auxs, context);
    
     args["data"] = NDArray(Shape(1, 3, 112, 112), context);    
     executor_.reset(net.SimpleBind(context,
                                    args,
                                    std::map<std::string, NDArray>(),
                                    std::map<std::string, OpReqType>(),
                                    auxs));
    
  3. Copy pixels of faces into NDArray

     cv::Mat resize_img;
     cv::resize(input, resize_img, cv::Size(112, 112));
     cv::Mat flat_image(resize_img.rows, resize_img.cols, CV_32FC3);
     //copy the pixels into flat_image as 3 channels, channel 0 contain blue pixel,
     //channel 1 contain green pixel, channel 2 contain red pixel
     auto *flat_ptr = flat_image.ptr<float>(0, 0);
     for(int ch = 0; ch != 3; ++ch){
         for(int row = 0; row != flat_image.rows; ++row){
             for(int col = 0; col != flat_image.cols; ++col){
                 auto const &vec = resize_img.at<cv::Vec3b>(row, col);
                 *flat_ptr = vec[ch];
                 ++flat_ptr;
             }
         }
     }
    
     auto data = NDArray(flat_image.ptr<float>(), Shape(1, 3, flat_image.rows, flat_image.cols), *context_);
    
  4. Extract features

     data.CopyTo(&executor_->arg_dict()["data"]);    
     executor_->Forward(false);
     if(!executor_->outputs.empty()){
         auto const features = executor_->outputs[0].Copy(Context(kCPU, 0));
         features.WaitToRead();
         cv::Mat temp(1, 512, CV_32F, const_cast<mx_float*>(features.GetData()), 0);
         cv::normalize(temp, temp, 1, 0, cv::NORM_L2);
         return face_key(features);
     }
    

However, the results are not the same compare with the python.

python solution after simplify

model = face_model.FaceModel(args)

img = cv2.imread('they.jpg')
if(img.size == 0):
	print("image is null")
else:
	print("image exist")
	img = cv2.resize(img, (2,2))
	print(img)
	print("\n--------------------------------\n")
	img = np.transpose(img, (2,0,1))
	print(img)
	print("\n--------------------------------\n")	
	
	img = cv2.imread('they.jpg')
	img = cv2.resize(img, (112,112))
	img = np.transpose(img, (2,0,1))
	print(img.shape)
	f1 = model.get_feature(img)	
	
        //compare similarity values with 100 images
	for i in range(100):
		img = cv2.imread("my_" + str(i) + ".jpg");
		img = cv2.resize(img, (112,112))
		img = np.transpose(img, (2,0,1))	
		f2 = model.get_feature(img)
	
		sim = np.dot(f1, f2.T)
		print(i, ":", sim)

Results of python and c++ are somewhat close, but not the same

python results

0 : 0.15990853
1 : -0.079816
2 : -0.06281938
3 : -0.06564736
4 : 0.16785772
5 : 0.10493934
6 : 0.07540929

cpp results

0:0.156019
1:-0.0790096
2:-0.0683103
3:-0.0695442
4:0.167428
5:0.0900509
6:0.0715431

Reasons I could think of

  1. Image after resize are a little bit different between c++ and python(unlikely the culprit)
  2. The way I copy the data to NDArray is wrong
  3. I use wrong way to load the check point of the model

Do anyone know what is the reason?Thanks


#2

I think it’s the alignment of face that causes this small difference between the two kinds results


#3

I think it is the alignment of face that causes the small difference among these results


#4

i think it is the alignment of face that causes the small difference among these results


#5

From what you say, I would actually think that the Image resizing is the culprit. The small change in the colors or sharpness of the resized image would affect prediction, but if the model is stable, then it shouldn’t affect it much. And this is exactly what I see from the numbers you have provided: the difference between predictions is super small - third digit after the dot.

If the problem was with copying data or loading model itself the difference should be way higher. Moreover, your model would most probably produce garbage results. To me, the fact that the difference is small means that you most probably don’t have major issues with your code (or the issues are the same in both C++ and Python implementation).

I would check if the model gives you correct predictions on your data, which matches to the ground truth answers you have. If it is, and if this difference is really important to you, then I would try to find a way to make image resizing/preprocessing work exactly the same way across languages.


#6

The python codes do not do any alignment either, so I think this is not the reason.


#7

It do give me correct predictions on my data. Just want to confirm I am doing the right thing, small different of the prediction scores is not a big deal for me.