Convolutional Neural Networks (LeNet)

https://en.diveintodeeplearning.org/chapter_convolutional-neural-networks/lenet.html

Hello!

First of all, thank you for a great learning material!

In the chapter about LeNet architecture you mention that your implementation matches the historical definition of Lenet5 (Gradient-Based Learning Applied to Document Recognition) except the last layer, but I found two other inconsistencies in subsection B. LeNet-5.

  • LeNet paper does not describe pooling layer as an average pooling layer, but rather as layer that perform summation over 2x2 neighborhood within input activation feature map, then multiply it with trainable weight, add trainable bias and finally pass it through sigmoidal function.

  • According to LeNet paper, the activation function used at both convolution and fully connected layers is scaled hyperbolic tangent function, not sigmoid as is used in code. These two functions looks similar but have different output range (http://m.wolframalpha.com/input/?i=tanh(a)%2C+sigmoid(a))

If there is something I missed and your implementation of LeNet5 is correct, please let me know.

Cheers,
Martin

Hey Martin,

Pooling was called sub-sampling in the original paper. According to the pg6 on the paper

"This can be achieved
with a socalled subsampling layers which performs a local
averaging and a subsampling reducing the resolution of
the feature map and reducing the sensitivity of the output
to shifts and distortions"

Also, for tanh vs sigmoid, it seems that tanh converges faster than sigmoid (especially useful in 20 years ago when compute power is not strong enough).

Hopefully it helps!
Rachel

Just want to point out that the link to the Multilayer Perceptron is no longer available for this page in the book.
http://www.d2l.ai/chapter_deep-learning-basics/mlp-scratch.md

Thanks. Please refer to http://d2l.ai/chapter_multilayer-perceptrons/mlp-scratch.html

Hi,

Thanks for the learning material!

But I have some problem when using the code.

# Save to the d2l package.
def train_ch5(net, train_iter, test_iter, num_epochs, lr, ctx=d2l.try_gpu()):
    net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
    loss = gluon.loss.SoftmaxCrossEntropyLoss()
    trainer = gluon.Trainer(net.collect_params(),
                            'sgd', {'learning_rate': lr})
    animator = d2l.Animator(xlabel='epoch', xlim=[0,num_epochs],
                            legend=['train loss','train acc','test acc'])
    timer = d2l.Timer()
    for epoch in range(num_epochs):
        metric = d2l.Accumulator(3)  # train_loss, train_acc, num_examples
        for i, (X, y) in enumerate(train_iter):
            timer.start()
            # Here is the only difference compared to train_epoch_ch3
            X, y = X.as_in_context(ctx), y.as_in_context(ctx)
            with autograd.record():
                y_hat = net(X)
                l = loss(y_hat, y)
            l.backward()
            trainer.step(X.shape[0])
            metric.add(l.sum().asscalar(), d2l.accuracy(y_hat, y), X.shape[0])
            timer.stop()
            train_loss, train_acc = metric[0]/metric[2], metric[1]/metric[2]
            if (i+1) % 50 == 0:
                animator.add(epoch + i/len(train_iter),
                             (train_loss, train_acc, None))
        test_acc = evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch+1, (None, None, test_acc))
    print('loss %.3f, train acc %.3f, test acc %.3f' % (
        train_loss, train_acc, test_acc))
    print('%.1f exampes/sec on %s'%(metric[2]*num_epochs/timer.sum(), ctx))

when I try to run
train_ch5(net, train_iter, test_iter, num_epochs, lr)
there is always the traceback

Traceback (most recent call last):
  File "lenet.py", line 63, in <module>
    train_ch5(net, train_iter, test_iter, num_epochs, lr)
  File "lenet.py", line 50, in train_ch5
    metric.add(l.sum().asscalar(), d2l.accuracy(y_hat, y), X.shape[0])
TypeError: add() takes 2 positional arguments but 4 were given

But since the code has already use metric = d2l.Accumulator(3), how could it happen that add() only takes 2 arguements?

I just rerun it and there was no error. This issue might cause by the new version of MXNet operators. Did you install the numpy version of MXNet? If not, please refer to http://numpy.d2l.ai/chapter_install/install.html

In the implementation of the function evaluate_accuracy_gpu, can we replace
ctx = list(net.collect_params().values())[0].list_ctx()[0]
simply by
ctx = net[0].weight.list_ctx()[0] ?