Revert to previous symbolic graph


#1

Hi all. I’m attempting to store an old version of a symbolic graph and compare to an updated version. In the C++ api I don’t see a clear way to do this. My guesses were to try creating independent Executor variables and to reassign the Executor->aux_array variable. However, neither of these produce the intended result. My guess now is that the internal node state is stored in aux_array of the symbol itself and that I could perform a grouping of symbols to more easily capture and re-assign internal state. I’m not sure tho so pls help


#2

Can you expand on what you mean by “old version” and “updated version” ? Can you try to explain what you are trying to achieve?


#3

Sure. It’s a 3-step process.

  1. Keep a copy of the weights/biases in a symbolic graph (of a neural network, for example).
  2. Perform one round of gradient descent to update node weights.
  3. Compare the updated weights to the old weights.

This is a common thing to do for checking the progress of a network in general. Specifically I am using it for updates to a reinforcement learning network.


#4

If you use python and the Gluon interface this is quite trivial since you can simply get the values of layer by calling .data() on the parameter. You might actually be more interested in checking specifically the gradients of a given parameter.

Look at the code snippet below:

net = gluon.model_zoo.vision.get_model('alexnet', pretrained=True)
net
AlexNet(
  (features): HybridSequential(
    (0): Conv2D(3 -> 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (2): Conv2D(64 -> 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (4): Conv2D(192 -> 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): Conv2D(384 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (8): Flatten
    (9): Dense(9216 -> 4096, Activation(relu))
    (10): Dropout(p = 0.5, axes=())
    (11): Dense(4096 -> 4096, Activation(relu))
    (12): Dropout(p = 0.5, axes=())
  )
  (output): Dense(4096 -> 1000, linear)
)
net.collect_params()
alexnet3_ (
  Parameter alexnet3_conv0_weight (shape=(64, 3, 11, 11), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv0_bias (shape=(64,), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv1_weight (shape=(192, 64, 5, 5), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv1_bias (shape=(192,), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv2_weight (shape=(384, 192, 3, 3), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv2_bias (shape=(384,), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv3_weight (shape=(256, 384, 3, 3), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv3_bias (shape=(256,), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv4_weight (shape=(256, 256, 3, 3), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_conv4_bias (shape=(256,), dtype=<class 'numpy.float32'>)
  Parameter alexnet3_dense0_weight (shape=(4096, 9216), dtype=float32)
  Parameter alexnet3_dense0_bias (shape=(4096,), dtype=float32)
  Parameter alexnet3_dense1_weight (shape=(4096, 4096), dtype=float32)
  Parameter alexnet3_dense1_bias (shape=(4096,), dtype=float32)
  Parameter alexnet3_dense2_weight (shape=(1000, 4096), dtype=float32)
  Parameter alexnet3_dense2_bias (shape=(1000,), dtype=float32)
)
net.collect_params()['alexnet3_dense2_weight'].data()
[[ 0.03272552 -0.00615264 -0.00395727 ...  0.01601001  0.04564209
  -0.01583865]
 [-0.02810573  0.03934928 -0.00352019 ... -0.02502617  0.02647938
  -0.01590014]
 [-0.00189331 -0.00041484 -0.00807727 ... -0.00933564  0.02027086
  -0.01356347]
 ...
 [-0.0248676  -0.03498199  0.0131346  ... -0.00824636  0.04538036
  -0.00425714]
 [ 0.02524208 -0.00262971 -0.01093859 ... -0.00907445 -0.06151189
  -0.00088412]
 [-0.00388105  0.00896387 -0.00179079 ...  0.02286987  0.0041664
   0.01847812]]
<NDArray 1000x4096 @cpu(0)>
net.collect_params()['alexnet3_dense2_weight'].grad()
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
<NDArray 1000x4096 @cpu(0)>

For the C++ API, I believe you should look at the in_arg_map() for values and in_grad_map() for gradients, let me know if that works for you. The aux_array is used for storing auxiliary parameters that are not learnt by gradient descent like the batch norm stats accumulation.


#5

I am able to get the gradients. There is a Executor.grad_arrays variable that is filled after a backwards pass. The gradients are then passed to an optimizer to update. It looks something like below:

executor->Forward(true); // train = true so that appropriate argument arrays are stored
executor->Backwards; // called after Forward(true) to fill appropriate gradient arrays

int i=0;
for(name : argNames) {
    // Perform optimizer update on appropriate argument arrays with appropriate gradient arrays
    if (name != "input" || name != "label")
        optimizer->update(i, executor->arg_arrays[i], executor->grad_arrays[i])
    i++;
}

This leads me to believe that storing the argument map (like the argument arrays but associates with symbols) and passing it to the executor will cause the executor to use this information during the forward/backwards pass. This doesn’t work. Even re-binding the executor with the initial argument map doesn’t work. I am pretty sure std::map assignments are deep copies, but that’s the only issue I think I could be having.


#6

You need to use NDArray.Copy() to create a deep copy of the NDArray


#7

Yup –facepalm–. Thanks


#8

Can’t you show how to let the executor update in detail? I may meet the same problem!
Mat imgOrig = imread("…/2.jpg");
float mean[3] = {0.485, 0.456, 0.406};
float std[3] = {0.229, 0.224, 0.225};
GetFeatureSymbol();
LoadParameters();
NDArray img_tensor = loadImageTensor(imgOrig, 1000, 512, mean, std, global_ctx);
/bind the executor/
//args_map[“data”] = img_tensor;
args_map[“data”] = NDArray(Shape(1, 3, 1000, 1000), global_ctx);
//net.InferArgsMap(global_ctx, &args_map, args_map);
auto *executor = net.SimpleBind(global_ctx, args_map,map<string, NDArray>(),
map<string, OpReqType>(), aux_map);
//args_map = executor->arg_dict();

NDArray img_tensor3 = loadImageTensor(imgOrig, 1000, 512, mean, std, global_ctx);
img_tensor3.Copy(global_ctx).CopyTo(&args_map["data"]);
NDArray::WaitAll();
//NDArray* nd = &executor->arg_arrays[0];
//img_tensor.CopyTo(nd);
//args_map["data"] = img_tensor;
extractAndShowOneImage(executor, imgOrig, img_tensor3);
imgOrig = imread("../3.jpg");
NDArray img_tensor4 = loadImageTensor(imgOrig, 1000, 512, mean, std, global_ctx);
img_tensor4.Copy(global_ctx).CopyTo(&args_map["data"]);
NDArray::WaitAll();
//executor->arg_arrays[0] = img_tensor;
extractAndShowOneImage(executor, imgOrig, img_tensor4);
//img_tensor = loadImageTensor(imgOrig, 1000, 512, mean, std, global_ctx);
extractAndShowOneImage(executor, imgOrig, img_tensor4);
return 0;

#9

It is unclear if this is the same problem. In order to do a deep copy to a new executor, see below:

std::map<std::string,NDArray> args;
// Define args
// ...
Symbol s1 = <some symbol definition>
auto *exec1 = s1.SimpleBind(Context::cpu(),args);
std::map<std::string,NDArray> args2;
copyNDArrayMap(args2,args);
auto *exec2 = s1.SimpleBind(Context::cpu(), args2);

void copyNDArrayMap(std::map<std::string,NDArray> &dst,
    const std::map<std::string,NDArray> &src) {
    for (auto it = src.begin(); it != src.end(); it++)
        dst[it->first] = it->second.Copy(Context::cpu());
}