http://d2l.ai/chapter_linear-networks/linear-regression.html

# Linear Regression

What makes linear regression

linearis that we assume that the output truly can be expressed as alinearcombination of the input features.

This seems different from what I’ve known: linear regression is linear w.r.t the parameters instead of the inputs.

It is actually the case that linear regression makes the assumption that the output is a linear combination of the input features. This means that your model assumes no other transformations besides multiplication by a scalar and addition is done to any of the input features.

Of course, you can change what you mean by ‘input features’ and apply some other (non-linear) transformations to the features first, but strictly speaking that is not part of the linear regression. Saying that linear regression is linear w.r.t the parameters is just a short hand way of saying that only linear combinations of the inputs with the parameters as coefficients are allowed.

I have a question about regression using mxnet, I am trying to use it on an expression like this where, w0 and x0 should be scalars:

auto stage1 = w0*x0*x0;

auto net = LinearRegressionOutput(“linreg”, stage1, Symbol::Variable(“label”));

My problem is that its not converging , I paste the whole code just in case.

#include

#include “mxnet-cpp/MxNetCpp.h”

using namespace std;

using namespace mxnet::cpp;

int main(int argc, char** argv)

{

const float learning_rate = 0.01;

vector<mx_float> input =

{

1.0,

3.0,

5.3,

8.0,

6

};

vector<mx_float> output =

{

3.1415,

9.4245,

16.64995,

25.132,

18.849

};

Context ctx0 = Context::cpu();

auto x0 = Symbol::Variable(“x”);

auto w0 = Symbol::Variable(“w0”);

auto stage1 = w0*x0*x0;

auto net = LinearRegressionOutput(“linreg”, stage1, Symbol::Variable(“label”));

NDArray daInputs = NDArray(input ,Shape(1,input.size()),ctx0);

NDArray daOutputs = NDArray(output,Shape(1,input.size()),ctx0);

std::map<string, NDArray> args0;

Optimizer* opt = OptimizerRegistry::Find(“adam”);

opt->SetParam(“lr”, learning_rate);//->SetParam(“wd”, weight_decay);

int epoch =100000;

auto arg_names = net.ListArguments();

while(epoch–)

{

for (int s=0 ; s != input.size(); s++)

{

args0[“x”] = NDArray({input[s]},Shape(1,1),ctx0);

args0[“label”] = NDArray({output[s]},Shape(1,1),ctx0);

auto exec0 = net.SimpleBind(ctx0, args0);

```
exec0->Forward(true);
exec0->Backward();
for(int i = 0 ; i != arg_names.size(); i++)
{
if (arg_names[i] == "w0")
opt->Update(i, exec0->arg_arrays[i], exec0->grad_arrays[i]);
cout << arg_names[i] << "=" << exec0->arg_arrays[i] << endl;
}
delete exec0;
}
```

}

return 1;

@mli in the gradient descent formula (3.1.10) aren’t you always changing all the coefficients together by the same value? shouldn’t this step be performed for each coefficient separately?