@mli nitpick:
Text and code suggest that the middle column of Graphic 7.2.1 should have a 2x2 MaxPool instead of a 3x3 MaxPool.
@mli, I think there is a typo in 7.2.1 as it reads:
β The basic building block of classic convolutional networks is a sequence of the following layers: (i) a convolutional layer (with padding to maintain the resolution), (ii) a nonlinearity such as a ReLU, One VGG block consists of a sequence of convolutional layers, followed by a max pooling layer for spatial downsampling.β
I believe it is missing something along this line: β(iii) a max pooling layer for spatial downsampling.β before βOne VGG block β¦β. Or, even better, the two sentences should be merged as they have similar meanings and they are redundant.