Hello!
I have encounter an error during training GluonCV Faster RCNN model.
I was trying to build a Faster RCNN model to detect bacteria in the image.
However, the number of bacteria in a single image that is labeled is super huge!!(>1000 BBox/image)
I’ve encounter an error during training that stop the training procedure in the beginning. I tried to reduce the label number in every single image and everything goes well again. Because of that, I was thinking about if this error is generate from having too many objects within a single image?
I’d like to know
**1. Is there any limitation related to the maximum object during training period? **
2. Is it possible this can be solve or bypass?
3. What’s the 2300 vs. 3146 number coming from?
Thank you in advance!
Model: Faster RCNN
Backbone: resnet50_v1b
Input image size: 1024x1024
**MXNetError: [12:23:54] src/operator/tensor/./matrix_op-inl.h:1442: Check failed: ishape[i] >= from_shape[i] (2300 vs. 3146) : Slice axis 0 with size 3146exceeds limit of input with size 2300**
Complete error message
net_name: **faster_rcnn_resnet50_v1b_bact** will be used.
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
loading annotations into memory...
Done (t=0.46s)
creating index...
index created!
batch_size 1
/usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py:703: UserWarning: Constant parameter "normalizedperclassboxcenterencoder4_means" does not support grad_req other than "null", and new value "write" is ignored.
'is ignored.'.format(self.name, req))
/usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py:703: UserWarning: Constant parameter "normalizedperclassboxcenterencoder4_stds" does not support grad_req other than "null", and new value "write" is ignored.
'is ignored.'.format(self.name, req))
/usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py:703: UserWarning: Constant parameter "fasterrcnn1_rpn0_rpnanchorgenerator0_anchor_" does not support grad_req other than "null", and new value "write" is ignored.
'is ignored.'.format(self.name, req))
INFO:root:<__main__.Args object at 0x7f157cc04358>
INFO:root:Start training from [Epoch 0]
---------------------------------------------------------------------------
MXNetError Traceback (most recent call last)
<ipython-input-51-0bba58febb4a> in <module>()
198
199 # training
--> 200 train(net, train_data, val_data, eval_metric, batch_size, ctx, args)
11 frames
/usr/local/lib/python3.6/dist-packages/mxnet/base.py in check_call(ret)
253 """
254 if ret != 0:
--> 255 raise MXNetError(py_str(_LIB.MXGetLastError()))
256
257
**MXNetError: [12:23:54] src/operator/tensor/./matrix_op-inl.h:1442: Check failed: ishape[i] >= from_shape[i] (2300 vs. 3146) : Slice axis 0 with size 3146exceeds limit of input with size 2300**
Stack trace:
**[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x6d554b) [0x7f15c233e54b]**
** [bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x363690f) [0x7f15c529f90f]**
** [bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*)+0x1d27) [0x7f15c55d8577]**
** [bt] (3) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x1db) [0x7f15c55e1dcb]**
** [bt] (4) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x3839f1f) [0x7f15c54a2f1f]**
** [bt] (5) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x62) [0x7f15c54a34e2]**
** [bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f1623420dae]**
** [bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f162342071f]**
** [bt] (8) /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2b4) [0x7f16236345c4]**
Not answering your question but how about using crowd-counting approaches? it works quite well to count large number of occurrences with occlusion. I created a request to include them to gluoncv (feel free to +1) and you can see associated research in the request https://github.com/dmlc/gluon-cv/issues/1240
2 Likes
Thank you! olivcruche
I think it is a good approach! I will try to understand more about crowd-computing to see if this can solve this crowd problem !
1 Like
I found the reason why the training was failed when I was training the Faster RCNN model with large object numbers.
The parameter I used is the preset parameter of Faster RCNN model were:
rpn_train_pre_nms=12000
rpn_train_post_nms=2000
max_num_gt=300
The first training picture contains 1146
objects.
The 2300
= rpn_train_post_nms
+max_num_gt
.
The 3146
= rpn_train_post_nms
+ 1146
The error message is saying:
Check failed: ishape[i] >= from_shape[i] (2300 vs. 3146)
Which I gusss is that during training process, the first array which containing 2300
elements will be slice_like with the second array containing 3146
elements. However the second array is larger than the ground truth array.
To solve this is just simply setting max_num_gt
to a larger number above the maximum ground truth object within a single picture and everything works quite well now~
Just want to take a note if someone make a mistake like me.
max_num_gt
: int, default is 300
Maximum ground-truth number for each example. This is only an upper bound, not
necessarily very precise. However, using a very big number may impact the training speed.
rpn_train_post_nms
: int, default is 2000
Return top proposal results after NMS in training of RPN.
1 Like