I am trying to finetune a model using resnet50 512. I have around ~850 training images, and when I run train.py, the validation/mAP starts low and quickly gets to around .25-.30 around 70 epochs. Then it seems to stay there indefinitely.
I am using the official apache incubator repo, but I had to make some code changes to get it to work.
Specifically I modified this block of code in train_net.py to match this version in order to remove layers which seem to not match with the pretrained model.
I used this command to start the training
python train.py --network resnet50 --train-path data/train.rec --val-path data/test.rec --class-names text_block --num-class 1 --data-shape 512 --lr 0.0001 --finetune 1 --end-epoch 1000 --gpus 0 --val-list test
I stopped the training at epoch 457. Here is the full log
https://s3.amazonaws.com/read-to-me-dataset/train.log
sample output from my logs during training.
INFO:root:Epoch[216] Validation-text_block=0.277103
INFO:root:Epoch[216] Validation-mAP=0.277103
INFO:root:Epoch[217] Batch [20] Speed: 92.03 samples/sec CrossEntropy=0.573019 SmoothL1=0.505079
INFO:root:Epoch[217] Train-CrossEntropy=0.568360
INFO:root:Epoch[217] Train-SmoothL1=0.480534
INFO:root:Epoch[217] Time cost=8.733
INFO:root:Saved checkpoint to “/home/aschu/development/apache-incubator/mxnet/incubator-mxnet/example/ssd/model/ssd_resnet50_512-0218.params”
INFO:root:Epoch[217] Validation-text_block=0.266455
INFO:root:Epoch[217] Validation-mAP=0.266455
INFO:root:Epoch[218] Batch [20] Speed: 97.72 samples/sec CrossEntropy=0.570998 SmoothL1=0.501688
INFO:root:Epoch[218] Train-CrossEntropy=0.566719
INFO:root:Epoch[218] Train-SmoothL1=0.494521
INFO:root:Epoch[218] Time cost=7.926
INFO:root:Saved checkpoint to “/home/aschu/development/apache-incubator/mxnet/incubator-mxnet/example/ssd/model/ssd_resnet50_512-0219.params”
INFO:root:Epoch[218] Validation-text_block=0.237516
INFO:root:Epoch[218] Validation-mAP=0.237516
INFO:root:Epoch[219] Batch [20] Speed: 95.20 samples/sec CrossEntropy=0.570789 SmoothL1=0.487176
INFO:root:Epoch[219] Train-CrossEntropy=0.570646
INFO:root:Epoch[219] Train-SmoothL1=0.475513
INFO:root:Epoch[219] Time cost=8.406
INFO:root:Saved checkpoint to “/home/aschu/development/apache-incubator/mxnet/incubator-mxnet/example/ssd/model/ssd_resnet50_512-0220.params”
INFO:root:Epoch[219] Validation-text_block=0.259088
INFO:root:Epoch[219] Validation-mAP=0.259088