Softmax Regression in Gluon

https://d2l.ai/chapter_linear-networks/softmax-regression-concise.html

My results with an 20 epoch (5 is too small to compare)

BS 1000, LR 0.1

  • epoch 10, loss 0.5247, train acc 0.827, test acc 0.832
  • epoch 20, loss 0.4783, train acc 0.839, test acc 0.842

BS100, LR 0.1

  • epoch 10, loss 0.4271, train acc 0.852, test acc 0.852
  • epoch 20, loss 0.4072, train acc 0.859, test acc 0.857

BS 1000, LR 0.5

  • epoch 10, loss 0.8573, train acc 0.803, test acc 0.772
  • epoch 20, loss 0.7668, train acc 0.813, test acc 0.837

BS10, LR 0.01

  • epoch 10, loss 0.4221, train acc 0.855, test acc 0.856
  • epoch 20, loss 0.4019, train acc 0.862, test acc 0.852

So in conclusion:

  • BS and LR must be correlated, and LR has an upper bound over which we’re going in every directions crazily.
  • the smaller the BS, the slower per epoch, but the fewer epoch we need to reach an optimum.
  • The training set accuracy always improves, but at some point the testing set accuracy decreases: We’re getting better at identifying the training set only. I guess we detect this stage with the testing set: When the accuracy of this set not used for training decreases, it’s time to stop the training.

Can we configure the training without a number of epoch, but instead a “stop or again” function?

Hi @SebastienCoste,

Sure, you could write a while loop for the epochs, where the most recent test loss should be lower than the test loss from the epoch before. It’s called ‘early stopping’ and is used to prevent overfitting.

I am getting the following error:

TypeError Traceback (most recent call last)
in
1 num_epochs = 10
----> 2 d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)

~/miniconda3/lib/python3.7/site-packages/d2l/train.py in train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr, trainer)
212 l.backward()
213 if trainer is None:
–> 214 sgd(params, lr, batch_size)
215 else:
216 trainer.step(batch_size)

~/miniconda3/lib/python3.7/site-packages/d2l/train.py in sgd(params, lr, batch_size)
60 def sgd(params, lr, batch_size):
61 “”“Mini-batch stochastic gradient descent.”""
—> 62 for param in params:
63 param[:] = param - lr * param.grad / batch_size
64

TypeError: ‘NoneType’ object is not iterable

From the above error message, i found that “params” is not passed in the function call. What should be my next step?

Just figured out where I was going wrong.
the correct function call is: d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, trainer)

HI everyone I am experiencing troubles to training the model with the function train_ch3().
with d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, trainer).
it notifies the following error:
train_ch3() takes 6 positional arguments but 9 were given.

Can anyone help please ?

This is the error that i am getting. Please can you help .

TypeError Traceback (most recent call last)
in
3
4 d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None,
----> 5 None, trainer)

TypeError: train_ch3() takes 6 positional arguments but 9 were given

Seems like they have updated the function. The new definition is as follows:

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)

I am getting the following error with the current version of the code:


TypeError Traceback (most recent call last)
in
1 num_epochs = 10
----> 2 d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

C:\Anaconda3\lib\site-packages\d2l\d2l.py in train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)
186 legend=[‘train loss’, ‘train acc’, ‘test acc’])
187 for epoch in range(num_epochs):
–> 188 train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
189 test_acc = evaluate_accuracy(net, test_iter)
190 animator.add(epoch+1, train_metrics+(test_acc,))

C:\Anaconda3\lib\site-packages\d2l\d2l.py in train_epoch_ch3(net, train_iter, loss, updater)
127 l = loss(y_hat, y)
128 l.backward()
–> 129 updater()
130 # measure loss and accuracy
131 train_l_sum += l.sum().asscalar()

TypeError: ‘Trainer’ object is not callable

Can you please help in this regard. Thanks in advance :slight_smile:

I fixed the issue by installing the latest version of “d2l” using the below command,

pip install git+https://github.com/d2l-ai/d2l-en

I think that since the code is changing frequently we should always use the above installation procedure.

Two things in 3.7.2.

  1. zj is the j-th element of the input y_linear variable.
    What does the input y_linear mean?

  2. But instead of passing softmax probabilities into our new loss function, we’ll just pass ŷ…
    ŷ is the value calculated by the softmax function. So I believe that ŷ should be replaced with z.

I am getting the following error:
in
1 num_epochs = 10
----> 2 d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

/media/vr/Storage/Python_Scripts/psenv/lib/python3.6/site-packages/d2l/d2l.py in train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)
284 legend=[‘train loss’, ‘train acc’, ‘test acc’])
285 for epoch in range(num_epochs):
–> 286 train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
287 test_acc = evaluate_accuracy(net, test_iter)
288 animator.add(epoch+1, train_metrics+(test_acc,))

/media/vr/Storage/Python_Scripts/psenv/lib/python3.6/site-packages/d2l/d2l.py in train_epoch_ch3(net, train_iter, loss, updater)
233 l.backward()
234 updater(X.shape[0])
–> 235 metric.add(float(l.sum()), accuracy(y_hat, y), y.size)
236 # Return training loss and training accuracy
237 return metric[0]/metric[2], metric[1]/metric[2]

TypeError: float() argument must be a string or a number, not ‘NDArray’