Best practicies when deploying an MXNet model

qheaden · May 4, 2018, 2:52pm

Hello all. I am trying to deploy my MXNet-based application to the web. I want to be able to handle as many web requests as possible. I am using two inception v3 models per request to provide an image classification. The problem is that web servers often like to provide multiple threads to handle multiple requests at a time, but when load testing my application, I notice that requests are taking a long time to complete.

What are some good suggestions to increase performance when deploying an MXNet model to the web? Should I create a pool of Module objects to handle classification? Should I use one Module object for all requests? Which objects should I cache instead of re-creating each request?

Thanks!

thomelane · May 4, 2018, 4:50pm

Hi @qheaden,

Are you aware of the MXNet Model Server project (MMS)? I see they have a section on production deployments.

Production Deployments
When launched directly, MMS uses a standalone Flask server. This is handy for testing and development. But for production deployments, we recommend using Gunicorn which should provide lower latency, higher throughput, and more efficient use of memory.

This project includes Dockerfiles to build containers recommended for production deployments. These containers demonstrate how to set up a production stack consisting of nginx, gunicorn, and MMS. The basic usage can be found on the Docker readme.

Another technique that can help you obtain high throughput is to batch up requests and process a batch rather than individual requests one after another. You could process a batch when batch size exceeds a certain number or after a certain amount of time (whichever is sooner). You will have the extra complexity of keeping track of samples though.

aspiringguru · May 28, 2018, 11:05pm

Hi @qheaden,

this might be useful also if you are not already tuning with these methods.
https://mxnet.incubator.apache.org/faq/perf.html

Topic		Replies	Views
Mxnet prediction on docker Performance	9	1620	June 5, 2018
How to speed up the train of neural network model with mxnet? Performance	12	3074	August 10, 2018
Documentation Request: Model Parallelism Tutorial Performance	6	1841	March 10, 2018
Best practices for prediction on a machine with multiple GPUs	3	1190	November 8, 2017
Performance of distributed training using dist_sync kv_store Performance	1	472	March 13, 2020

Best practicies when deploying an MXNet model

Related Topics