For example:
softmax1 = resnet_50()
softmax2 = resnet_50()
out = mx.sym.Group([softmax1, softmax2])
model = mx.module.Module(symbol=out, context=ctx, data_names=['data1', 'data2'], label_names=['softmax_label1','softmax_label2'])
train_dataiter = get_dataiter() #will produce the DataBatch with [('data1',[N, 3, 224,224]), ('data2',[N,3,224,224])] and the data and label of data1 and data2 are totally the same.
model.fit(train_dataiter, ......)
My code is something like above. Both resnet 50 is initialized by the same initializer. However, the outputs of these two softmax are different in the begining, Something likes 27.x vs 21.x
It is very strange I think.