Batch-size could be any of the dimensions, but normally batch-size is the first dimension of your NDArray. For example in my example, batch-size would be x_{1}. Think of your FC as a mapping from `(dim0, input_dim)`

to `(dim0, output_dim)`

. Now when `flatten`

is True, you can think of FC to do a reshape from (x_{1},x_{2},…,x_{n-1}, x_{n}) to (x_{1}, x_{2} * x_{3} * … * x_{n-1} * x_{n}) before doing dense calculation. In this case, `dim0`

is x_{1} and `input_dim`

is x_{2} * x_{3} * … * x_{n-1} * x_{n}. Output in this case is (x_{1}, `output_dim`

). If batch-size is the first dimension (i.e. x_{1}), this would be equivalent to getting `(batch_size, output_dim)`

output.

Now when flatten is False, you can think of FC to do a reshape from (x_{1},x_{2},…,x_{n-1}, x_{n}) to (x_{1} * x_{2} * … * x_{n-1}, x_{n}) before doing dense calculation. In this case, `dim0`

is x_{1} * x_{2} * … * x_{n-1} and `input_dim`

is x_{n}. Also in this case, there is a reshape after doing the dense calculation that recovers the initial shapes back to (x_{1}, x_{2}, … ,x_{n-1}, `output_dim`

). If batch-size is the first dimension (i.e. x_{1}), this would be equivalent to getting (`batch_size`

, x_{2}, … ,x_{n-1}, `output_dim`

) output.