Neural network for regression with multiple output

Hi all - I’m new to mxnet and trying to build a simple neural network for a regression problem with multiple outputs (40 features in, 40 features out). I’m working on the Scala API.
Built my custom iterator, here is where my troubles started. I’m not 100% clear about what the iterator should return. The interface DataIter mandates an IndexedSeq[NDArray] . So I assumed one element of the sequence per sample, and a size of sequence equal to batch size. However looking at the examples there seems to be only one element in the sequence, and the NDArray should have in my case a shape of (batchSize, 40). Is this the case?
Assuming it is, I gave exactly the same shape to the labels, then set off building the network.

val batchSize = 50
val trainDataIter = new CustomIterator(batchSize, args(0))
val valDataIter = new CustomIterator(batchSize, args(1))

// model definition
val data = Symbol.Variable("data")
val fc1 = Symbol.FullyConnected(name = "fc1")()(Map("data" -> data, "num_hidden" -> 40))
val act1 = Symbol.Activation(name = "relu1")()(Map("data" -> fc1, "act_type" -> "relu"))
val fc2 = Symbol.FullyConnected(name = "fc2")()(Map("data" -> act1, "num_hidden" -> 40))
val lrm = Symbol.LinearRegressionOutput(name = "lrm")()(Map("data" -> fc2))

val mod = new Module(lrm)

mod.bind(trainDataIter.provideData, Some(trainDataIter.provideLabel))

However what I get out of this is

Exception in thread “main” java.lang.IllegalArgumentException: requirement failed: number of dimensions in shape :2 with shape: (50,40) should match the length of the layout: 4 with layout: NCHW

I have to admit I’m rather lost here. I’m not using any convolutional layer, should I use NCHW? If that is the case, what would my input/label shape be? Something the likes of (batchSize, 1,1,40)?
Any help/direction appreciated!