Referring to the MXNet codebase, specifically, pooling function.
In that case, shouldn’t it be referred to as pad_type
Referring to the MXNet codebase, specifically, pooling function.
In that case, shouldn’t it be referred to as pad_type
No, padding is very different. pooling_convention
is added to address the difference between MXNet and Caffe for people trying to replicated caffe networks in MXNet. Please refer to the exact definition of pooling convention.
Yes, I took a look at the Pooling method.
One of the feature requests was to add “same” padding (along side - valid and full currently available). In that sense, I guess, type of padding is encoded in pooling-convention.
Don’t you think so?
I suppose. But you’re using Tensorflow’s padding terminology and trying to find an equivalent for it in MXNet. Tensorflow uses this terminology for padding in convolution and pooling. In MXNet, padding is specified as a symmetric pad integer value (unless you use pad
operator). pooling_convention
is added to handle a very specific implementation corner case. I don’t expect pooling_convention
to be extended to add same
padding or to add an equivalent of that to convolution operator.
Unable to understand why pooling_convention
can’t be extended to add ‘same
’ padding
Internally, a user has requested for inclusion of same
in Max Pooling
for padding (or in MXNet lingo - pooling convention
If you can already achieve the effect of same
padding using the provided integer padding argument, why would you add same
to pooling convention as well? What happens if one specifies same
but also provides pad
argument? Also if you pay close attention to the formula used for pooling_convention
, you’ll notice that p
, which signifies the amount of padding, is already in the formula. Personally I think adding same
is inconsistent with convolution and will make the API confusing. Feel free to seek out opinions from others in the community or send an email to the dev email list.
I was under the assumption,
input_data
Apply padding around it with padding value (negative infinity)
Apply pooling convention based on stride
Can you verify if this is right -
That’s effectively what happens (although no actual padding is done by -inf
, but rather the padded area is assumed to have no impact on the outcome, i.e. for max-pool, it’s as if the value is -inf
, for avg-pool it’s as if the value is 0
, and so on). SAME
padding in tensorflow (as described here) calculates the required padding so that if stride is 1, output size is equal to the input size. In MXNet, you can achieve this behavior by specifying the padding value to be equal to kernel//2
). This implementation, however, does not allow you to replicated SAME
behavior if kernel is an even number because padding is applying symmetrically. In the case of even-size kernel, you need to apply asymmetric padding separately using pad
operator to replicate behavior of ‘SAME’ padding in tensorflow.
With respect to your table, it isn’t very clear. Looks like you’re using kernel size of 2 in your example. It’s not quite clear what you expect to happen when pooling convention is same
. It doesn’t look like you’re trying to replicate tensorflow’s behavior of SAME
padding. If you explain what you actually want to do, I might be able to help more effectively.
A. So basically, it is a painstaking process currently if someone wants to replicate Tensorflow’s same
padding.
How to make it simple?
Either incorporating it in the function
Or keeping it as it is but writing it explicitly (in the documentation)
B. Yes, kernel size
is 2
I’m trying to see if the outputs are as expected by tweaking 3 things (stride
, pad
and pooling_convention
)
Moreover, because the formulae for output_width
are different for these pooling_conventions
, I think, we need to incorporate that by tweaking function definition
Full : ceil(float(input_width + 2 * pad - filter_width + 1) / float(stride))
Valid : floor(float(input_width + 2 * pad - filter_width + 1) / float(stride))
Same : ceil(float(input_width + 2 * pad) / float(stride))
I agree that replicating TF’s SAME
can be a bit of head scratcher. As I explained, if your kernel is odd in dimension, then use pad=kernel//2
and if kernel is even, then use pad
operator with left pad set to kernel/2
and right pad set to kernel/2-1
to get TF’s SAME
behavior.
The formula you show for output size under same
is just a formula that results in an output with the same size as input if pad=0
and stride=1
. Not really sure what objective this will achieve. The reason filter_width
is subtracted in the formula is due to the nature of how correlation (or convolution) with a kernel works. If you wanted the output_width to results in what you’re proposing under same
pooling convention, the underlying implementation needs to re-calculate left-pad and right-pad to be able achieve the requested output_width
based on pad
and convention.