Training object detection model with large amount of objects

Hello!

I have encounter an error during training GluonCV Faster RCNN model.
I was trying to build a Faster RCNN model to detect bacteria in the image.
However, the number of bacteria in a single image that is labeled is super huge!!(>1000 BBox/image)
I’ve encounter an error during training that stop the training procedure in the beginning. I tried to reduce the label number in every single image and everything goes well again. Because of that, I was thinking about if this error is generate from having too many objects within a single image?
I’d like to know
**1. Is there any limitation related to the maximum object during training period? **
2. Is it possible this can be solve or bypass?
3. What’s the 2300 vs. 3146 number coming from?

Thank you in advance!

Model: Faster RCNN
Backbone: resnet50_v1b
Input image size: 1024x1024

**MXNetError: [12:23:54] src/operator/tensor/./matrix_op-inl.h:1442: Check failed: ishape[i] >= from_shape[i] (2300 vs. 3146) : Slice axis 0 with size 3146exceeds limit of input with size 2300**

Complete error message

net_name: **faster_rcnn_resnet50_v1b_bact** will be used.
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
loading annotations into memory...
Done (t=0.46s)
creating index...
index created!
batch_size 1
/usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py:703: UserWarning: Constant parameter "normalizedperclassboxcenterencoder4_means" does not support grad_req other than "null", and new value "write" is ignored.
  'is ignored.'.format(self.name, req))
/usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py:703: UserWarning: Constant parameter "normalizedperclassboxcenterencoder4_stds" does not support grad_req other than "null", and new value "write" is ignored.
  'is ignored.'.format(self.name, req))
/usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py:703: UserWarning: Constant parameter "fasterrcnn1_rpn0_rpnanchorgenerator0_anchor_" does not support grad_req other than "null", and new value "write" is ignored.
  'is ignored.'.format(self.name, req))
INFO:root:<__main__.Args object at 0x7f157cc04358>
INFO:root:Start training from [Epoch 0]
---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
<ipython-input-51-0bba58febb4a> in <module>()
    198 
    199 # training
--> 200 train(net, train_data, val_data, eval_metric, batch_size, ctx, args)

11 frames
/usr/local/lib/python3.6/dist-packages/mxnet/base.py in check_call(ret)
    253     """
    254     if ret != 0:
--> 255         raise MXNetError(py_str(_LIB.MXGetLastError()))
    256 
    257 

**MXNetError: [12:23:54] src/operator/tensor/./matrix_op-inl.h:1442: Check failed: ishape[i] >= from_shape[i] (2300 vs. 3146) : Slice axis 0 with size 3146exceeds limit of input with size 2300**
 Stack trace:
      **[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x6d554b) [0x7f15c233e54b]**
**      [bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x363690f) [0x7f15c529f90f]**
**      [bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*)+0x1d27) [0x7f15c55d8577]**
**      [bt] (3) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x1db) [0x7f15c55e1dcb]**
**      [bt] (4) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x3839f1f) [0x7f15c54a2f1f]**
**      [bt] (5) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x62) [0x7f15c54a34e2]**
**      [bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f1623420dae]**
**      [bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f162342071f]**
**      [bt] (8) /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2b4) [0x7f16236345c4]**

Not answering your question :slight_smile: but how about using crowd-counting approaches? it works quite well to count large number of occurrences with occlusion. I created a request to include them to gluoncv (feel free to +1) and you can see associated research in the request https://github.com/dmlc/gluon-cv/issues/1240

2 Likes

Thank you! olivcruche
I think it is a good approach! I will try to understand more about crowd-computing to see if this can solve this crowd problem :laughing:!

1 Like

I found the reason why the training was failed when I was training the Faster RCNN model with large object numbers.
The parameter I used is the preset parameter of Faster RCNN model were:

rpn_train_pre_nms=12000
rpn_train_post_nms=2000
max_num_gt=300

The first training picture contains 1146 objects.

The 2300 = rpn_train_post_nms+max_num_gt.

The 3146 = rpn_train_post_nms+ 1146

The error message is saying:

Check failed: ishape[i] >= from_shape[i] (2300 vs. 3146)

Which I gusss is that during training process, the first array which containing 2300 elements will be slice_like with the second array containing 3146 elements. However the second array is larger than the ground truth array.

To solve this is just simply setting max_num_gt to a larger number above the maximum ground truth object within a single picture and everything works quite well now~
Just want to take a note if someone make a mistake like me.


max_num_gt : int, default is 300
Maximum ground-truth number for each example. This is only an upper bound, not
necessarily very precise. However, using a very big number may impact the training speed.
rpn_train_post_nms : int, default is 2000
Return top proposal results after NMS in training of RPN.

1 Like