How to get deterministic results in different runs?

I want to know how to make the results deterministic in different runs. Because I try to parallelize a model using Horovod. I want to make sure the results are deterministic no matter how many processes are used, as long as the same hyper-parameters are used.

I started with the simplest MNIST example in https://github.com/apache/incubator-mxnet/blob/master/example/distributed_training-horovod/gluon_mnist.py. Since all different results are because of random seed, I set the same random seed by adding the following code in the file:

import numpy as np
import random

mx.random.seed(1234)
np.random.seed(1234)
random.seed(1234)

Only one process was used to run the file. But the output accuracy are still not the same in different runs. I also tried to set shuffle as True in both train and val iterator, but the results were still not reproducible. So how to make the results deterministic in different runs?

I found the same issue in https://github.com/apache/incubator-mxnet/issues/10831, but that issue was not solved.

Why no one answer this question? I think this is a very important question.

Did you set the seed for each device? MXNet uses the device ID to set the state of the random number generator. That means random numbers generated from different devices can be different even if they are seeded using the same seed. To make sure random numbers are the same on each device, you need to set the context.

mx.random.seed(128, ctx=mx.gpu(0))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.gpu(0)).asnumpy())
[[ 2.5020072 -1.6884501]
 [-0.7931333 -1.4218881]]
mx.random.seed(128, ctx=mx.gpu(1))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.gpu(1)).asnumpy())
[[ 2.5020072 -1.6884501]
 [-0.7931333 -1.4218881]]

For more details, you can check the following documentation: https://mxnet.incubator.apache.org/api/python/symbol/random.html#mxnet.random.seed

I only used one GPU, so the context should not matter much. But I also just tried to add ctx=mx.gpu(0) in my code, and the output accuracy is still different in different runs.