For educational purposes I want to have a linear regression example that is using `mx.sym.dot(X, w)`

instead of

`mx.sym.FullyConnected(X, num_hidden=1)`

, see code example below. Is there a way to do this?

I know I can do a similar thing with `nd`

and autograd instead of `sym`

, but then I also have to implement SGD by hand, which is not what I am looking for â€¦

```
m = 1000
batch_size = 100
nVars = 4
data = np.random.normal(0,1, (m, nVars))
labels = -10 * data[:,0] + data[:,1]*np.pi + 5 * np.sqrt(abs(data[:,2])) - data[:,3] + np.random.normal(0,1, m)*2
train_iter = mx.io.NDArrayIter(data={'data':data}, label={'labels':labels}, batch_size=batch_size)
X = mx.sym.Variable('data', shape=(batch_size, nVars))
y = mx.sym.Variable('labels', shape=(batch_size))
w = mx.sym.var(name='theta', shape=(nVars), init=mx.initializer.Normal())
# this works as expected
fc = mx.sym.FullyConnected(data=X, name='fc1', num_hidden=1)
yhat = mx.sym.LinearRegressionOutput(fc, label=y, name='yhat')
model = mx.mod.Module(symbol=yhat, data_names=['data'], label_names=['labels'])
train_iter.reset()
model.fit(train_iter, num_epoch=10)
pred = model.predict(train_iter).asnumpy().flatten()
# with this solution I cannot figure out how to make the optimizer improve w.
fc_dot = mx.sym.dot(X, w)
yhat_dot = mx.sym.LinearRegressionOutput(fc_dot, label=y, name='yhat_dot')
model_dot = mx.mod.Module(symbol=yhat_dot, data_names=['data'], label_names=['labels'])
train_iter.reset()
model_dot.fit(train_iter, num_epoch=10)
pred_dot = model_dot.predict(train_iter).asnumpy().flatten()
np.mean(pred_dot - labels)
```