Anyway to discard the useless `CUDA_ARCH` in `libmxnet.dll`?

It is known that a smaller .dll/.so file leads to a faster loading time. For now, libmxnet.dll is too large and takes too much time to load in my Windows 10.

It seems that if we have a relocatable .dll/.so file, we could using nvprune to discard all the useless CUDA_ARCHes and get a smaller .dll file.

For now, nvprune, I tried it and it gives me an error:

nvprune fatal   : Input file 'libmxnet.dll' not relocatable

Make the libmxnet.dll/libmxnet.so relocatable will provides the possibility to further decrease the size of the dll file. And may further more decrease the import time.

It seems the symbol are contained in libmxnet.lib, but I can not merge these symbols into libmxnet.dll. If someone find a way to redistribute a libmxnet.dll with symbols, we may take less time import mxnet in python.

Any Idea?

Any idea?
I wait the result for days.
It seems not very difficult for the C programmers.

any ideas @leleamol?

Are there someone interested with this problem?