In order to implement a model-parallel in the symbolic loss calculation using horovod, I implemented a CustomOp to do the horovod.mxnet.allreduce.
The customzed Op is used in constructing the symbol like this:
global_a = mx.symbol.Custom(data=local_a, op_type=‘hvd_allreduce’, average=average, op_name=‘a’)
global_b = mx.symbol.Custom(data=local_b, op_type=‘hvd_allreduce’, average=average, op_name='b)
c = a / b
However, on different processes, the execution global_a and global_b is out of order, which causes deadlock in horovod. So the key is to force the execution of global_a and global_b in the same order across different processes.
My question is if there is any way to make the execution of global_a and global_b in order? Any suggestion is highly appreciated.