Gluoncv + Horovod = segfault 11?

Hi,
I’m adapting this gluoncv script https://gluon-cv.mxnet.io/build/examples_classification/dive_deep_cifar10.html to use Horovod on a single AWS p3.16xl instance. I edited the script with those Horovod snippets: https://github.com/horovod/horovod/blob/master/docs/mxnet.rst

However training results in a:

Segmentation fault: 11

Stack trace:
  [bt] (0) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2f9cf20) [0x7f7b6974ff20]
  [bt] (1) /lib64/libc.so.6(+0x362f0) [0x7f7c43d2b2f0]
  [bt] (2) /lib64/libpthread.so.0(pthread_mutex_lock+0) [0x7f7c440cbc40]
  [bt] (3) /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2364ea7) [0x7f7a003cdea7]
  [bt] (4) /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2368153) [0x7f7a003d1153]
  [bt] (5) /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x235d9c6) [0x7f7a003c69c6]
  [bt] (6) /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXEnginePushAsync+0x2f7) [0x7f7a0034a707]
  [bt] (7) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/horovod/mxnet/mpi_lib.cpython-36m-x86_64-linux-gnu.so(horovod_mxnet_broadcast_async+0x1f0) [0x7f7a02ebff60]
  [bt] (8) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f7b998b6ec0] 

I am using gluoncv 0.5.0, horovod 0.18.2, mxnet 1.5.0