I’m looking into the implementation of Faster R-CNN in Gluon-CV and noticed that the feature extraction is split in 2 parts, before and after feature pooling. The second feature extraction step (with top_feature net) of the pooled features is to my knowledge not described in the Faster R-CNN paper written by Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.
Can someone describe why this additional top_feature extractor is added, or give a paper reference describing this step? Is it to simplify training or does it contribute to better detections during inference?
Below the link to the above mentioned implementation: