mxnet.symbol.FullyConnected(data=None, weight=None, bias=None, num_hidden=_Null, no_bias=_Null, flatten=_Null, name=None, attr=None, out=None, **kwargs)¶
If flatten is set to be false, then the shapes are:
data: (x1, x2, …, xn, input_dim)
weight: (num_hidden, input_dim)
bias: (num_hidden,)
out: (x1, x2, …, xn, num_hidden)
So what’s the xn and input_dim meaning here? suppose a input image of (28,28,3) with batch 8.
Basically if flatten
is False
, FC layer is applied to the last dimension of input array:
FC: input_dim --> num_hidden
.
For example if data is (x1, x2, …, xn-1, xn), the output will be (x1, x2, …, xn-1, num_hidden) and the weight dimensions will be (num_hidden, xn).
Thanks. But I still don’t get it.
If (x1, … xn) is data shape, where is batch_size? I see if flatten is True, there is a batch_size dimension. But I don’t understand why it’s missing when flatten is False.
And what is input_dim here? if (x1, … xn) is (28, 28, 1), input_dim would be 3? and output shape would be (x1, … x_n-1, num_hidden)? Isn’t it different from official document since there writes (x1, … xn, num_hidden)? Or it’s typo?
I don’t really understand when flatten is False. For flatten is True, it’s quite simple and you can find documents on other framework to be similar. (Like in TF).
Batch-size could be any of the dimensions, but normally batch-size is the first dimension of your NDArray. For example in my example, batch-size would be x1. Think of your FC as a mapping from (dim0, input_dim)
to (dim0, output_dim)
. Now when flatten
is True, you can think of FC to do a reshape from (x1,x2,…,xn-1, xn) to (x1, x2 * x3 * … * xn-1 * xn) before doing dense calculation. In this case, dim0
is x1 and input_dim
is x2 * x3 * … * xn-1 * xn. Output in this case is (x1, output_dim
). If batch-size is the first dimension (i.e. x1), this would be equivalent to getting (batch_size, output_dim)
output.
Now when flatten is False, you can think of FC to do a reshape from (x1,x2,…,xn-1, xn) to (x1 * x2 * … * xn-1, xn) before doing dense calculation. In this case, dim0
is x1 * x2 * … * xn-1 and input_dim
is xn. Also in this case, there is a reshape after doing the dense calculation that recovers the initial shapes back to (x1, x2, … ,xn-1, output_dim
). If batch-size is the first dimension (i.e. x1), this would be equivalent to getting (batch_size
, x2, … ,xn-1, output_dim
) output.
2 Likes
OK. I see. Actually input_dim is just xn. I was thinking about n-dim of the whole input. That’s weird to use two different notation for single function of same parameter?
So in short, what your said is like:
"
For flatten is True, input shape: (x1, x2, … xn-1, xn) -> output shape: (x1, num_hidden)
For flatten is False, input shape: (x1, x2, … xn-1, xn) -> output shape: (x1, x2, … xn-1, num_hidden)
And x1 is batch_size usually.
"
Am I right?
Thanks a lot for the patience. This solved my question.