Softmax Regression in Gluon

My results with an 20 epoch (5 is too small to compare)

BS 1000, LR 0.1

  • epoch 10, loss 0.5247, train acc 0.827, test acc 0.832
  • epoch 20, loss 0.4783, train acc 0.839, test acc 0.842

BS100, LR 0.1

  • epoch 10, loss 0.4271, train acc 0.852, test acc 0.852
  • epoch 20, loss 0.4072, train acc 0.859, test acc 0.857

BS 1000, LR 0.5

  • epoch 10, loss 0.8573, train acc 0.803, test acc 0.772
  • epoch 20, loss 0.7668, train acc 0.813, test acc 0.837

BS10, LR 0.01

  • epoch 10, loss 0.4221, train acc 0.855, test acc 0.856
  • epoch 20, loss 0.4019, train acc 0.862, test acc 0.852

So in conclusion:

  • BS and LR must be correlated, and LR has an upper bound over which we’re going in every directions crazily.
  • the smaller the BS, the slower per epoch, but the fewer epoch we need to reach an optimum.
  • The training set accuracy always improves, but at some point the testing set accuracy decreases: We’re getting better at identifying the training set only. I guess we detect this stage with the testing set: When the accuracy of this set not used for training decreases, it’s time to stop the training.

Can we configure the training without a number of epoch, but instead a “stop or again” function?

Hi @SebastienCoste,

Sure, you could write a while loop for the epochs, where the most recent test loss should be lower than the test loss from the epoch before. It’s called ‘early stopping’ and is used to prevent overfitting.

I am getting the following error:

TypeError Traceback (most recent call last)
1 num_epochs = 10
----> 2 d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)

~/miniconda3/lib/python3.7/site-packages/d2l/ in train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr, trainer)
212 l.backward()
213 if trainer is None:
–> 214 sgd(params, lr, batch_size)
215 else:
216 trainer.step(batch_size)

~/miniconda3/lib/python3.7/site-packages/d2l/ in sgd(params, lr, batch_size)
60 def sgd(params, lr, batch_size):
61 “”“Mini-batch stochastic gradient descent.”""
—> 62 for param in params:
63 param[:] = param - lr * param.grad / batch_size

TypeError: ‘NoneType’ object is not iterable

From the above error message, i found that “params” is not passed in the function call. What should be my next step?

Just figured out where I was going wrong.
the correct function call is: d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, trainer)