Same input different output

I find that the first output is different from the seconde output.
Is it a bug?
Thank you!

from mxnet import gluon, image, init, nd, autograd, gpu, cpu
from mxnet.gluon import data as gdata, loss as gloss, model_zoo, nn
import numpy as np
import  mxnet as mx

ctx = gpu(0)
conv = nn.Conv2D(in_channels=3, channels=32, kernel_size=3,
              strides=1, padding=1, use_bias=False)
bn = nn.BatchNorm(momentum=0.1, in_channels=32)
ru = nn.LeakyReLU(alpha=0.2)
convt = nn.Conv2DTranspose(in_channels=32, channels=64, kernel_size=3,
                   strides=2, padding=1, output_padding=1, use_bias=False)
model = [conv,bn,ru,convt]
net = nn.HybridSequential()
with net.name_scope():
    for layer in model:

net.collect_params('.*gamma|.*running_var').initialize(mx.init.Constant(1), ctx=ctx)


with autograd.record():
    y = net(x.as_in_context(ctx))
    #'dd1', y.asnumpy())

    s_pred = np.load('dd1.npy')
    t_pred = y.asnumpy()
    sub = s_pred - t_pred
    print (sub.sum())

I’ve tried your repro and isolated the “faulty” layer: Conv2DTranspose.
This might come from cuDNN or cuBLAS algo used and which may be nondeterministic because of fp precision.

The absolute mean difference is between 5.2916054e-12 - 5.2916054e-13, so it should not impact most of applications.
Did you encounter a case where it does?

Thank you!
I am converting pytorch code to mxnet code.
pytorch code :
I do same like pytorch code.
But my code can’t reach the top like the pytorch code.

I try to concast the output of pytorch and mxnet.
I find the twice output of mxnet code is defferent when initialize, but the twice output of pytorch code is same.

I don’t know whether this is the reason why mxnet code con’t reach the top.
I try to solve it.