Speed Issue converting NDarray to np.array

David_McDonough · August 20, 2019, 5:31am

Hi there,

I am using the Gluon package to run a Faster RCNN Coco-trained model (Link here - this is for ssd but I’m using a FRCNN model).

The issue I’m having is that when I convert the output arrays for bounding_boxes, scores & class_IDs it takes a very long time. This is because the mxnet engine is asynchronous, and the system needs to finish computations before converting the array. This time factor is killing me as we have thousands of images to perform detentions on, each taking about 1-2 seconds.

The FasterRCNN output gives us an 80000 element array, although only the first 10 are needed. Slicing the array to be the first 10 and using .asnumpy() still takes a long time though, because it is still computing the other elements…

I can only think of two solutions to speed up the code:

Make the initial output of the FRCNN network shorter eg max ouput of array length 10
Use environment variables to change the engine to a synchronous engine (Under “Engine Type”)
Somehow find a way to make asnumpy() faster

Can anyone help?
Thanks,
David

NRauschmayr · August 20, 2019, 7:47pm

What is your current batch size? Why do you have an 80000 element array? How many classes are you trying to predict?

asnumpy() is a blocking call so the execution will be stopped until the result can be retrieved. Try to avoid that call and use instead MXNet NDarray calls: https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html For instance if you are only interested in the objects with highest probabilities, then you can use ndarray.argmax https://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html#mxnet.ndarray.NDArray.argmax

There are several other options to speed up the inference:

run inference on large image batches to increase throughput
optimize your model with TensorRT

Here are some useful links:

David_McDonough · August 21, 2019, 12:19am

Thanks NRauschmayr,

Below is an example of the code from the gluon website.

from matplotlib import pyplot as plt
import gluoncv
from gluoncv import model_zoo, data, utils
    
net = model_zoo.get_model('faster_rcnn_resnet50_v1b_voc', pretrained=True)

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
                          'gluoncv/detection/biking.jpg?raw=true',
                          path='biking.jpg')
x, orig_img = data.transforms.presets.rcnn.load_test(im_fname)

box_ids, scores, bboxes = net(x)
ax = utils.viz.plot_bbox(orig_img, bboxes[0], scores[0], box_ids[0], class_names=net.classes)

plt.show()

box_ids, scores and bboxes all return 80000 element arrays. However, only the first 6 are valid scores, the rest have scores of -1.

Can I add a parameter (batch size) so that less elements are returned and less computations are needed?

Thanks

Topic		Replies	Views
Mxnet forward operation on the first batch is very slow	4	894	October 19, 2017
What wrong with MxNet asnumpy()? Performance	2	1659	July 29, 2019
FasterRCCN Coco takes 5s each foward pass	3	411	August 28, 2019
Homework Q4	3	368	January 29, 2019
Forward pass performance (for one image) is quite slow. Concerns mxnet 0.11.0 Performance	2	1049	January 23, 2018

Speed Issue converting NDarray to np.array

Related Topics