Bind error "Target shape size is different to source"


Recently I pre-trained a MXNet CNN model offline. Then it was saved to local disk with the code like below:

model.save_checkpoint('model/model', 0000)
with open ( 'model/model-shapes.json', "w") as shapes:
    json.dump([{"shape": model.data_shapes[0][1], "name": "data"}], shapes)


After I tried to deploy an endpoint on SageMaker with this model. An exception was thrown when SageMaker was doing healthy check during model deployment:

RuntimeError: simple_bind error. Arguments:
    data: (1, 32)
    Error in operator reshape0: [16:21:39] src/operator/tensor/./matrix_op-inl.h:157: Check failed: oshape.Size() == dshape.Size() (480000 vs. 9600) Target shape size is different to source. Target: [50,1,32,300]
    Source: [1,32,300]


Looks like the model expect a prediction input having exactly the same data shape as the training input.

The model was trained with a batch_size = 50. When I was doing prediction locally, giving a single data point wasn’t working. What I did was to tweak the input into 50*data_point, which is exactly as the training input. But this trick couldn’t be applied to SageMaker, because the exception was thrown in the healthy check stage, while I even didn’t have a chance to input anything yet.

I was wondering if anyone has any easy ways (other than training a model with batch size of 1) to solve this issue? I heard about the re-bind concept. But I was not sure how to use it properly.



Don’t save JSON manually, save_checkpoint will save a JSON file.


@zhreshold thanks for your reply!
I just tried the save_checkpoint method. It looks like only model-XXXX.params and model-symbol.json were generated. model-shapes.json was missing, and I’ll still have to generate it manually. Maybe I used wrong method? Any ideas?


why do you need a model shape json file?