http://d2l.ai/chapter_convolutionalneuralnetworks/lenet.html
Convolutional Neural Networks (LeNet)
Hello!
First of all, thank you for a great learning material!
In the chapter about LeNet architecture you mention that your implementation matches the historical definition of Lenet5 (GradientBased Learning Applied to Document Recognition) except the last layer, but I found two other inconsistencies in subsection B. LeNet5.

LeNet paper does not describe pooling layer as an average pooling layer, but rather as layer that perform summation over 2x2 neighborhood within input activation feature map, then multiply it with trainable weight, add trainable bias and finally pass it through sigmoidal function.

According to LeNet paper, the activation function used at both convolution and fully connected layers is scaled hyperbolic tangent function, not sigmoid as is used in code. These two functions looks similar but have different output range (http://m.wolframalpha.com/input/?i=tanh(a)%2C+sigmoid(a))
If there is something I missed and your implementation of LeNet5 is correct, please let me know.
Cheers,
Martin
Hey Martin,
Pooling was called subsampling in the original paper. According to the pg6 on the paper
"This can be achieved
with a socalled subsampling layers which performs a local
averaging and a subsampling reducing the resolution of
the feature map and reducing the sensitivity of the output
to shifts and distortions"
Also, for tanh vs sigmoid, it seems that tanh converges faster than sigmoid (especially useful in 20 years ago when compute power is not strong enough).
Hopefully it helps!
Rachel
Just want to point out that the link to the Multilayer Perceptron is no longer available for this page in the book.
http://www.d2l.ai/chapter_deeplearningbasics/mlpscratch.md
Hi,
Thanks for the learning material!
But I have some problem when using the code.
# Save to the d2l package.
def train_ch5(net, train_iter, test_iter, num_epochs, lr, ctx=d2l.try_gpu()):
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
loss = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(),
'sgd', {'learning_rate': lr})
animator = d2l.Animator(xlabel='epoch', xlim=[0,num_epochs],
legend=['train loss','train acc','test acc'])
timer = d2l.Timer()
for epoch in range(num_epochs):
metric = d2l.Accumulator(3) # train_loss, train_acc, num_examples
for i, (X, y) in enumerate(train_iter):
timer.start()
# Here is the only difference compared to train_epoch_ch3
X, y = X.as_in_context(ctx), y.as_in_context(ctx)
with autograd.record():
y_hat = net(X)
l = loss(y_hat, y)
l.backward()
trainer.step(X.shape[0])
metric.add(l.sum().asscalar(), d2l.accuracy(y_hat, y), X.shape[0])
timer.stop()
train_loss, train_acc = metric[0]/metric[2], metric[1]/metric[2]
if (i+1) % 50 == 0:
animator.add(epoch + i/len(train_iter),
(train_loss, train_acc, None))
test_acc = evaluate_accuracy_gpu(net, test_iter)
animator.add(epoch+1, (None, None, test_acc))
print('loss %.3f, train acc %.3f, test acc %.3f' % (
train_loss, train_acc, test_acc))
print('%.1f exampes/sec on %s'%(metric[2]*num_epochs/timer.sum(), ctx))
when I try to run
train_ch5(net, train_iter, test_iter, num_epochs, lr)
there is always the traceback
Traceback (most recent call last):
File "lenet.py", line 63, in <module>
train_ch5(net, train_iter, test_iter, num_epochs, lr)
File "lenet.py", line 50, in train_ch5
metric.add(l.sum().asscalar(), d2l.accuracy(y_hat, y), X.shape[0])
TypeError: add() takes 2 positional arguments but 4 were given
But since the code has already use metric = d2l.Accumulator(3)
, how could it happen that add()
only takes 2 arguements?
I just rerun it and there was no error. This issue might cause by the new version of MXNet operators. Did you install the numpy version of MXNet? If not, please refer to http://numpy.d2l.ai/chapter_install/install.html
In the implementation of the function evaluate_accuracy_gpu, can we replace
ctx = list(net.collect_params().values())[0].list_ctx()[0]
simply by
ctx = net[0].weight.list_ctx()[0] ?
@gold_piggy @mli
I think there is an error in the description about the output shape of 1st conv layer.
In the end of section 6.1.1,
The convolutional layer uses a kernel with a height and width of 5, which with only 2 pixels of padding in the first convolutional layer and none in the second convolutional layer leads to reductions in both height and width by 2 and 4 pixels, respectively.
the 1st conv layer actually has 2 pixel padding on both side of input so I think there is no reduction on the 1st conv output (28 x 28).