MxNet model taking a long time to load


#1

I was working with the resnet50 model from https://github.com/deepinsight/insightface – I noticed the following line take almost 8-9 minutes to load in a GPU, cuda8.0 with python2.7

Command :

python test.py --model ../models/model-r50-am-lfw/model,0 --flip 1
model.bind(data_shapes=[('data', (1, 3, image_size[0], image_size[1]))])

Stack trace:

loading ../models/model-r50-am-lfw/model 0
[17:51:46] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.12.1. Attempting to upgrade...
[17:51:46] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
/usr/local/lib/python2.7/dist-packages/mxnet/model.py:928: DeprecationWarning: mxnet.model.FeedForward has been deprecated. Please use mxnet.mod.Module instead.
  **kwargs)
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[18:00:38] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[18:00:38] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!


#2

@domarps can you point to the exact file and line please?

Thanks!


#3

Thanks @ThomasDelteil – I have updated the post. I suspect the issue is similar to the following issues:

  1. https://github.com/apache/incubator-mxnet/issues/1557
  2. https://github.com/apache/incubator-mxnet/issues/10016

I did not get a clear answer from either issue. My environment is an AWS p3.2x with Deep Learning Base AMI (Ubuntu) Version 6.0 (ami-ce3673b6).

Before running the model, I ran the command:

- pip install mxnet-cu80
- export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64


#4

If you’re loading the image for the first time and nothing is restored, you’re always going to pay that expensive JIT cost. I’m not sure how to handle it on AWS, but you basically want to make sure your cuda cache is retained between instance runs. I had similar issue when using a docker with mxnet which was resolved by ensuring the cached carried across docker runs by (in my case) simply mapping the cache directory to somewhere on host so it was permanent. You can find that thread here: