Nan generated when I use backward for symbol

Hello, everyone, I encountered the problem that grad_array is always nan and the output bw is none. My code is as follows. Can anyone give me a hand? Many thanks!

-----------------------------Code-----------------------------

import os
import time
import mxnet as mx
from mxnet import ndarray as nd
import numpy as np
devices = mx.cpu()
import mxnet.module as mod

sigma_2 = 0.25
n = 48 # M = 12 D = 4
C = 120
K = 3
d = 256

data = mx.sym.Variable(‘data’)
E_batch = mx.sym.FullyConnected(data=data, name=‘E_batch’, num_hidden=d)
reps = mx.sym.Variable(name=‘reps’)

reps_expand = mx.sym.expand_dims(data= reps, name= ‘reps_expand’, axis=0)
reps_repeat = mx.sym.repeat(data=reps_expand, name=‘reps_repeat’, repeats=n, axis=0)

E_batch_expand1 = mx.sym.expand_dims(data=E_batch, name= ‘E_batch_expand1’, axis=1)
E_batch_expand2 = mx.sym.expand_dims(data=E_batch_expand1, name=‘E_batch_expand2’, axis=1)
E_batch_repeat1 = mx.sym.repeat(data=E_batch_expand2, name=‘E_batch_repeat1’, repeats=C, axis=1)
E_batch_repeat2 = mx.sym.repeat(data=E_batch_repeat1, name=‘E_batch_repeat2’, repeats=K, axis=2)

distance = mx.sym.sum(data=(reps_repeat - E_batch_repeat2) ** 2, axis=3)
prob = mx.sym.max(data=mx.sym.exp(-distance / (2 * sigma_2)), axis=2)

test

rand = np.random.RandomState(seed=123)
prob_norm = mx.sym.norm(data=prob, name=‘prob_norm’, axis=1)
label = mx.sym.Variable(name= ‘label’)
loss = mx.sym.SoftmaxOutput(data=prob_norm, label=label, name=‘loss’)

E_batch_size = (n, d)
reps_size = (C, K, d)
weight_size = (d, d)
bias_size = (d,)
e = mx.nd.array(rand.uniform(0, 10, E_batch_size), ctx = devices)
r = mx.nd.array(rand.uniform(0, 10, reps_size), ctx = devices)
l = mx.nd.array(rand.randint(0,2, n), ctx= devices)
E_batch_w = mx.nd.array(rand.uniform(0,2, size=weight_size), ctx = devices)
#args:
args = {‘data’: e, ‘reps’: r, ‘label’: l, ‘E_batch_weight’: E_batch_w, ‘E_batch_bias’: mx.nd.array(rand.uniform(0, 2, d), ctx = devices)}

args grad:

args_grad = {‘E_batch_weight’: mx.nd.zeros((d, d)), ‘E_batch_bias’: mx.nd.zeros((d,))}
executor = loss.bind(ctx = devices, args =args, args_grad=args_grad)

for i in range(20):
out = executor.forward(is_train=True)[0].copy()
# # print(“Data: %s” % e.asnumpy())
# # print(“Label: %s” % r.asnumpy())
# # print(“out: %s” % executor.outputs[0].asnumpy())
# out = executor.outputs[0].asnumpy()
bw = executor.backward()

print(bw)

I don’t see where the training should happen, seems like the code doesn’t really update weights.
If you want to go with symbols, I recommend using Module to do the training: https://mxnet.incubator.apache.org/tutorials/basic/module.html#creating-a-module

Alternatively, you can use Gluon API - https://gluon-crash-course.mxnet.io/

Thanks for your reply. I think I have solved the problem.