How to get gradients using symbol API

bschrift · July 4, 2018, 7:14pm

This is probably a silly question but I’m having a difficult time learning the autograd API and the Symbol API. I can’t seem to figure out how to compute the gradient of a function when the function is using symbols and not NDArrays. For example:

x_in = mx.nd.array([2])
x_in.attach_grad()

X = mx.sym.Variable("X")

with autograd.record():
    F = X * X
    execute = F.bind(ctx=mx.cpu(0),args={'X' : x_in})
    out = execute.forward()
    
    grad = autograd.grad(out[0], [x])

This code gives an error: “Cannot differentiate node because it is not in a computational graph.”

I feel like I’m missing some sort of fundamental information about how the autograd api and symbol api work, but I can’t seem to find examples of gradients being calculated with symbols.

Thanks for any help!

ThomasDelteil · July 5, 2018, 10:10pm

You don’t need to explicitly use autograd when using symbol. Set the is_train argument to true on your forward pass and the information will be kept in order to do a backward pass and get back the computed gradients.
You need to allocate the memory for your gradients through the args_grad argument.

If you want to use symbols, the Module API is good at hiding these low-level details from you.

Otherwise I would suggest to use Gluon

x_in = mx.nd.array([2])

X = mx.sym.Variable("X")
F = X * X

executor = F.bind(ctx=mx.cpu(0),args={"X" : x_in}, args_grad= {"X": mx.nd.zeros((1))})

out = executor.forward(is_train=True).copy()

execute.backward(out)
print(execute.grad_arrays)

[[16.]<NDArray 1 @cpu(0)>]

bschrift · July 6, 2018, 7:46am

Is it possible to compute the hessian or other higher order gradients with this method? For example, is there a symbolic gradient operator that you could compute the gradient of?

ThomasDelteil · July 6, 2018, 11:33pm

@bschrift please see this thread for second order derivatives: Obtaining second order derivatives for a function wrt arbitrary parameters in the computation graph

mouryarishik · June 9, 2019, 5:47am

How can we calculate gradients for the intermediate outputs using symbol?
for example:

a = mx.sym.var(name = 'a', shape = (1, 1), dtype = 'float64')
b = mx.sym.var(name = 'b', shape = (1, 1), dtype = 'float64')

c = mx.sym.broadcast_add(a, b, name = 'c')
d = mx.sym.make_loss(mx.sym.broadcast_add(c, b, name = 'd'))

bind = d.simple_bind(ctx = mx.cpu(1))
bind.forward(a = mx.nd.ones((1, 1)), b = mx.nd.ones((1, 1)), grad_req = {'a': 'write', 'b':'write', 'c':'write'})
outputs = bind.outputs[0]
# print(outputs)
bind.backward()
print(bind.grad_dict)
#prints 
  # {'a': 
  # [[1.]]
  # <NDArray 1x1 @cpu(1)>, 
  # 'b': 
  # [[2.]]
  # <NDArray 1x1 @cpu(1)>}

So bind.grad_dict doesn’t provide gradients of c even though I’ve written 'c':write in grad_req.

So how can I get the gradients of “d with respect to c” as well?

sad · June 10, 2019, 8:29pm

I think you might be able to use mx.sym.GradBlock to get what you want.

See this github issue for more details

mouryarishik · June 11, 2019, 4:07pm

I’ve seen this issue but can’t figure out what do I need to do. It confuses me further because GradBlock is supposed to block any gradient computation of that layer, so how am I supposed to get intermediate gradients from that?? That really doesn’t make any sense.

Topic		Replies	Views
How to calculate gradients for the intermediate outputs using symbol?	0	336	June 9, 2019
Gradient fetching Discussion	2	586	May 31, 2018
Automatic Differentiation D2L Book	22	3093	December 13, 2019
Adding network gradient to the computational graph Gluon	3	1644	December 17, 2018
Computing per-class gradients	5	651	August 16, 2018

How to get gradients using symbol API

Related Topics