Image classification example accuracy issue

Hi,
I am currently using the examples given by default.[https://github.com/apache/incubator-mxnet/tree/master/example/image-classification]

I tried the train_imagenet.py file to train on minc-2500 ataset(http://opensurfaces.cs.cornell.edu/publications/minc/) and also on imagenet dataset with 15 classes.I use network googlenet and alexnet.the network does not converge at all even after 30 epoch, may I know what would be the reason?

alexnet :
python train_imagenet.py --network alexnet --data-train data/minc-2500/minc_train.rec --data-val data/minc-2500/minc_val.rec --batch-size 32 --gpus 0

INFO:root:start with arguments Namespace(batch_size=32, benchmark=0, data_nthreads=4, data_train=‘data/minc-2500/minc_train.rec’, data_train_idx=’’, data_val=‘data/minc-2500/minc_val.rec’, data_val_idx=’’, disp_batches=20, dtype=‘float32’, gc_threshold=0.5, gc_type=‘none’, gpus=‘0’, image_shape=‘3,362,362’, initializer=‘default’, kv_store=‘device’, load_epoch=None, loss=’’, lr=0.1, lr_factor=0.1, lr_step_epochs=‘30,60’, macrobatch_size=0, max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network=‘alexnet’, num_classes=23, num_epochs=50, num_examples=57500, num_layers=50, optimizer=‘sgd’, pad_size=0, random_crop=1, random_mirror=1, rgb_mean=‘123.68,116.779,103.939’, test_io=0, top_k=0, warmup_epochs=5, warmup_strategy=‘linear’, wd=0.0001)

INFO:root:Epoch[0] Batch [20] Speed: 127.70 samples/sec accuracy=0.043155
INFO:root:Epoch[0] Batch [40] Speed: 131.16 samples/sec accuracy=0.039062
INFO:root:Epoch[0] Batch [60] Speed: 130.84 samples/sec accuracy=0.037500
INFO:root:Epoch[0] Batch [80] Speed: 131.04 samples/sec accuracy=0.065625
INFO:root:Epoch[0] Batch [100] Speed: 131.38 samples/sec accuracy=0.042188
INFO:root:Epoch[0] Batch [120] Speed: 131.09 samples/sec accuracy=0.045312
INFO:root:Epoch[0] Batch [140] Speed: 131.24 samples/sec accuracy=0.051562
INFO:root:Epoch[0] Batch [160] Speed: 131.48 samples/sec accuracy=0.050000
INFO:root:Epoch[0] Batch [180] Speed: 131.31 samples/sec accuracy=0.045312
INFO:root:Epoch[0] Batch [200] Speed: 131.07 samples/sec accuracy=0.045312
INFO:root:Epoch[0] Batch [220] Speed: 131.06 samples/sec accuracy=0.043750
INFO:root:Epoch[0] Batch [240] Speed: 130.68 samples/sec accuracy=0.037500
INFO:root:Epoch[0] Batch [260] Speed: 131.18 samples/sec accuracy=0.043750
INFO:root:Epoch[0] Batch [280] Speed: 131.09 samples/sec accuracy=0.053125
INFO:root:Epoch[0] Batch [300] Speed: 131.29 samples/sec accuracy=0.032813
INFO:root:Epoch[0] Batch [320] Speed: 131.08 samples/sec accuracy=0.032813
INFO:root:Epoch[0] Batch [340] Speed: 131.19 samples/sec accuracy=0.051562
INFO:root:Epoch[0] Batch [360] Speed: 130.34 samples/sec accuracy=0.039062
INFO:root:Epoch[0] Batch [380] Speed: 131.02 samples/sec accuracy=0.031250

it is oscillating between 0.03 to 0.06 for all the epochs(i have trained for 30 epochs)

googlenet:
python train_imagenet.py --network googlenet --data-train data/minc-2500/minc_train.rec --data-val data/minc-2500/minc_val.rec --batch-size 32 --gpus 0

(batch_size=32, benchmark=0, data_nthreads=4, data_train=‘data/minc-2500/minc_train.rec’, data_train_idx=’’, data_val=‘data/minc-2500/minc_val.rec’, data_val_idx=’’, disp_batches=20, dtype=‘float32’, gc_threshold=0.5, gc_type=‘none’, gpus=‘0’, image_shape=‘3,362,362’, initializer=‘default’, kv_store=‘device’, load_epoch=None, loss=’’, lr=0.1, lr_factor=0.1, lr_step_epochs=‘30,60’, macrobatch_size=0, max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix=None, mom=0.9, monitor=0, network=‘googlenet’, num_classes=23, num_epochs=50, num_examples=57500, num_layers=50, optimizer=‘sgd’, pad_size=0, random_crop=1, random_mirror=1, rgb_mean=‘123.68,116.779,103.939’, test_io=0, top_k=0, warmup_epochs=5, warmup_strategy=‘linear’, wd=0.0001)

INFO:root:Epoch[0] Train-accuracy=0.053571
INFO:root:Epoch[0] Time cost=952.415
INFO:root:Epoch[0] Validation-accuracy=0.044097

INFO:root:Epoch[5] Train-accuracy=0.057292
INFO:root:Epoch[5] Time cost=943.619
INFO:root:Epoch[5] Validation-accuracy=0.044097

INFO:root:Epoch[15] Train-accuracy=0.041667
INFO:root:Epoch[15] Time cost=946.351
INFO:root:Epoch[15] Validation-accuracy=0.044097

Here also network is not converging.

I tried to tune learning rate but did not find any impact.

I have used im2rec.py file to create .rec

please find .lst for dataset as below:

16971 6.000000 glass/glass_001971.jpg
54200 21.000000 water/water_001700.jpg
3516 1.000000 carpet/carpet_001016.jpg
25416 10.000000 mirror/mirror_000416.jpg
2885 1.000000 carpet/carpet_000385.jpg
6751 2.000000 ceramic/ceramic_001751.jpg
51509 20.000000 wallpaper/wallpaper_001509.jpg
12101 4.000000 foliage/foliage_002101.jpg
20185 8.000000 leather/leather_000185.jpg
44179 17.000000 sky/sky_001679.jpg
6394 2.000000 ceramic/ceramic_001394.jpg
52674 21.000000 water/water_000174.jpg
32555 13.000000 paper/paper_000055.jpg
24374 9.000000 metal/metal_001874.jpg

please find hardware details:
Wed Jun 12 20:45:57 2019
GPU 0: GeForce GTX 1050 Ti (UUID: GPU-0c8e7d1a-3eba-0c9d-bedd-b4ec3365622c)
GPU 1: Quadro M2000 (UUID: GPU-dad3df22-426f-8ded-acf2-5cafc1395e58)

anyone please can help me out why it is not converging or i am missing any point…

Try optimizer=‘adam’ lr=‘0.01’

@ThomasDelteil i tried it but still didn’t help but i added batchnorm and after that it is converging with lr=0.07

1 Like

Consider, lr=0.1, lr_factor=0.1, lr_step_epochs=‘30,60’ in your log. As far as I understand, it is warming up for 30 epochs. So, you should wait for 60 epochs.

PS: Lowering lr, using Adam and using BN is a good choice for sure.