Question about params[:] = param - lr*param.grad/batch_size

I am not a Python expert, so I have a question about the code in sgd() function:
for param in params:
param[:] = param - lr*param.grad/batch_size

Here why don’t we just write the code as follows?
for param in params:
param = param - lr*param.grad/batch_size

The operation should be element-wise. What’s the difference of using param[:] and param? I tried to change the code to param, but the output will be incorrect. Any help is appreciated!

This way allows you to assign new value to existing NDArray, rather than making it reference another location in memory. Consider the following code:

import mxnet as mx

a = mx.random.uniform(shape=(10, 5))
print(hex(id(a))) # print memory address 'a' references to

a[:] = a - 1 
print(hex(id(a))) # print memory address again

a = a - 1
print(hex(id(a))) # ...and again

The output will be (actual addresses are going to be different in your run):


As you can see, first two addresses are the same - assignment of a[:] just overwritten the data of a. The last address is different: a points to a different memory location.

As @Sergey has explained, by using [:] you prevent creating a new NDArray. So doing param[:] = param - lr*param.grad/batch_size you are replacing the internal value of param with “param - lr*param.grad/batch_size” instead of creating a new NDArray.

And welcome back again to the community.

Thank you @Sergey and @mouryarishik for the very detailed explanation with code examples. I completely understand it now.