I’m training resnet18_v1
on ImageNet dataset with officially provided code via Gluon https://gluon-cv.mxnet.io/_downloads/3bb06a6d6d085b1bb501b30aaf6c21c5/train_imagenet.py.
When I tried to access the model parameters via:
params = dict(net.collect_params())
weight = params["resnetv10_conv0_weight"].data()
An error occured: Parameter 'resnetv10_conv0_weight' was not initialized on context cpu(0)
.
I encounter this error only if the model is trained with multiple GPUs (–num-gpus > 1).
When I use --num-gpus 1
, everything works smoothly.
So how can the error be resolved?
Can you check with the following command where the parameters are located?
print(params["resnetv10_conv0_weight"].data().context)
You may have to manually copy the parameters to the right context by using the following function https://beta.mxnet.io/api/ndarray/_autogen/mxnet.ndarray.NDArray.copyto.html
@kaizhao you need to specify the context from which you want the data from
params["resnetv10_conv0_weight"].data(ctx=mx.gpu(0))
for example
1 Like
Ok I got it. Thanks @NRauschmayr also.
I have another question: if I make some changes to the parameters by
p = params["conv1_weight"].data(ctx=mx.gpu(0))
p = change_the_params(p)
params["conv1_weight"].set_data(p)
How can I async the changes to all other devices? Or will it be done automatically by calling
.set_data()