Does pooling convention refer to type of padding being used?


Referring to the MXNet codebase, specifically, pooling function.

In that case, shouldn’t it be referred to as pad_type


No, padding is very different. pooling_convention is added to address the difference between MXNet and Caffe for people trying to replicated caffe networks in MXNet. Please refer to the exact definition of pooling convention.


Yes, I took a look at the Pooling method.
One of the feature requests was to add “same” padding (along side - valid and full currently available). In that sense, I guess, type of padding is encoded in pooling-convention.

Don’t you think so?


I suppose. But you’re using Tensorflow’s padding terminology and trying to find an equivalent for it in MXNet. Tensorflow uses this terminology for padding in convolution and pooling. In MXNet, padding is specified as a symmetric pad integer value (unless you use pad operator). pooling_convention is added to handle a very specific implementation corner case. I don’t expect pooling_convention to be extended to add same padding or to add an equivalent of that to convolution operator.


Unable to understand why pooling_convention can’t be extended to add ‘same’ padding

Internally, a user has requested for inclusion of same in Max Pooling for padding (or in MXNet lingo - pooling convention


If you can already achieve the effect of same padding using the provided integer padding argument, why would you add same to pooling convention as well? What happens if one specifies same but also provides pad argument? Also if you pay close attention to the formula used for pooling_convention, you’ll notice that p, which signifies the amount of padding, is already in the formula. Personally I think adding same is inconsistent with convolution and will make the API confusing. Feel free to seek out opinions from others in the community or send an email to the dev email list.


I was under the assumption,

  1. input_data

  2. Apply padding around it with padding value (negative infinity)

  3. Apply pooling convention based on stride

Can you verify if this is right -


That’s effectively what happens (although no actual padding is done by -inf, but rather the padded area is assumed to have no impact on the outcome, i.e. for max-pool, it’s as if the value is -inf, for avg-pool it’s as if the value is 0, and so on). SAME padding in tensorflow (as described here) calculates the required padding so that if stride is 1, output size is equal to the input size. In MXNet, you can achieve this behavior by specifying the padding value to be equal to kernel//2). This implementation, however, does not allow you to replicated SAME behavior if kernel is an even number because padding is applying symmetrically. In the case of even-size kernel, you need to apply asymmetric padding separately using pad operator to replicate behavior of ‘SAME’ padding in tensorflow.

With respect to your table, it isn’t very clear. Looks like you’re using kernel size of 2 in your example. It’s not quite clear what you expect to happen when pooling convention is same. It doesn’t look like you’re trying to replicate tensorflow’s behavior of SAME padding. If you explain what you actually want to do, I might be able to help more effectively.


A. So basically, it is a painstaking process currently if someone wants to replicate Tensorflow’s same padding.
How to make it simple?
Either incorporating it in the function
Or keeping it as it is but writing it explicitly (in the documentation)

B. Yes, kernel size is 2
I’m trying to see if the outputs are as expected by tweaking 3 things (stride, pad and pooling_convention)


Moreover, because the formulae for output_width are different for these pooling_conventions, I think, we need to incorporate that by tweaking function definition

Full : ceil(float(input_width + 2 * pad - filter_width + 1) / float(stride))

Valid : floor(float(input_width + 2 * pad - filter_width + 1) / float(stride))

Same : ceil(float(input_width + 2 * pad) / float(stride))


I agree that replicating TF’s SAME can be a bit of head scratcher. As I explained, if your kernel is odd in dimension, then use pad=kernel//2 and if kernel is even, then use pad operator with left pad set to kernel/2 and right pad set to kernel/2-1 to get TF’s SAME behavior.

The formula you show for output size under same is just a formula that results in an output with the same size as input if pad=0 and stride=1. Not really sure what objective this will achieve. The reason filter_width is subtracted in the formula is due to the nature of how correlation (or convolution) with a kernel works. If you wanted the output_width to results in what you’re proposing under same pooling convention, the underlying implementation needs to re-calculate left-pad and right-pad to be able achieve the requested output_width based on pad and convention.