Why module.bind()'s for_training parameter have a such a big influence on results?

During inference, I used the mx.mod.Module() to get a mod, and then use mod.bind(...). I find if I pass for_training=False to mod.bind(...), the results are all same as the training, BUT, if I not set the for_training, the result are totally error!

I cannot figure out why, any advices?