How to use %100 of the gpu for Inference in Python?

#1

Hi,

I am detection faces with python and mxnet but my 2080Ti Gpu only %10 utlized.

I am feeding 2Mp video and how can I use %100 of the gpu ?

Best

#2

MXNet will use 100% of your GPU by default, if it doesn’t have any limiting factors. The most usual limiting factors are the speed of reading data from the disk or the small batch size. Check this video for more information: https://www.youtube.com/watch?v=Cqo7FPftNyo - while is it talking about training performance optimization, the same logic applies to inference.

#3

Hi @Sergey thanks for your feedback. I am trying to detect faces from video a glimpse of the code is :

  def detect(self, img, threshold=0.5, scales=[1.0], do_flip=False):
#print('in_detect', threshold, scales, do_flip, do_nms)
proposals_list = []
scores_list = []
landmarks_list = []
timea = datetime.datetime.now()
flips = [0]
if do_flip:
  flips = [0, 1]

for im_scale in scales:



im_tensor = np.zeros((1, 3, im.shape[0], im.shape[1]))
    for i in range(3):
        im_tensor[0, i, :, :] = (im[:, :, 2 - i]/self.pixel_scale - self.pixel_means[2 - i])/self.pixel_stds[2-i]
    if self.debug:
      timeb = datetime.datetime.now()
      diff = timeb - timea
      print('X2 uses', diff.total_seconds(), 'seconds')
    data = nd.array(im_tensor)
    db = mx.io.DataBatch(data=(data,), provide_data=[('data', data.shape)])
    if self.debug:
      timeb = datetime.datetime.now()
      diff = timeb - timea
      print('X3 uses', diff.total_seconds(), 'seconds')
    self.model.forward(db, is_train=False)
    net_out = self.model.get_outputs() 

How can I make gpu &100 full ?

#4

It is hard to say anything specific from this code snippet.

What is your batch size? How much memory is occupied in your GPU when you do inference ? How many workers do you use to read data?

#5

Hi @Sergey my unfortunate is new to mxnet :frowning: and python . mostly c++ girl .

As far as I see from the code it is getting one image per detection :slight_smile:

  def detect(self, img, threshold=0.5, scales=[1.0], do_flip=False):

I dont know how to give more images (such as 25 frames) and get back their inference result in order :frowning: