Run-time discrepancies of v0.9.5 vs. 0.11.0 on TX2

leonid · October 4, 2017, 10:01pm

I have a large discrepancy in ResNet-18 run-time on Jetson TX2 for MXNet v0.9.5 vs. v0.11.0 when running with batch=1: 91 ms/fr vs. 124 ms/fr. nvvp shows that the same conv kernels are called and they take ~2x longer compared in case of v0.11.0. Do you have an idea why that might happen?
Experimental setup:

Jetson TX2 with Jetpack 3.1 (cuDNN 6.0, CUDA 8.0)
ResNet-18 based on resnet.py symbol available from repo
input resolution 640x480
batch = 1

running with batch = 8 results into similar run-time in both cases

smolix · October 5, 2017, 4:11am

Could you please post a code snippet. That’ll help us figure out what is going on. Also, did you compare with the latest current version in Git?

Topic		Replies	Views
Training speed in MXNet is nearly 2.5x times slower than Pytorch	8	2987	January 20, 2019
Mxnet 1.3.1: speed/performance differences between the mxnet gluon and module/symbol APIs of at least a factor of 2 Performance	11	1380	February 27, 2019
Inconsistent results on GPU Discussion	0	317	March 20, 2020
It's strange.C++ predicts much more slowly than python predicts	1	535	May 26, 2019
Gluon implementation much slower than Symbolic Performance	9	1706	August 20, 2018

Run-time discrepancies of v0.9.5 vs. 0.11.0 on TX2

Related Topics