Linear Regression

http://d2l.ai/chapter_linear-networks/linear-regression.html

What makes linear regression linear is that we assume that the output truly can be expressed as a linear combination of the input features.

This seems different from what I’ve known: linear regression is linear w.r.t the parameters instead of the inputs.

It is actually the case that linear regression makes the assumption that the output is a linear combination of the input features. This means that your model assumes no other transformations besides multiplication by a scalar and addition is done to any of the input features.

Of course, you can change what you mean by ‘input features’ and apply some other (non-linear) transformations to the features first, but strictly speaking that is not part of the linear regression. Saying that linear regression is linear w.r.t the parameters is just a short hand way of saying that only linear combinations of the inputs with the parameters as coefficients are allowed.

I have a question about regression using mxnet, I am trying to use it on an expression like this where, w0 and x0 should be scalars:

auto stage1 = w0x0x0;
auto net = LinearRegressionOutput(“linreg”, stage1, Symbol::Variable(“label”));

My problem is that its not converging , I paste the whole code just in case.

#include
#include “mxnet-cpp/MxNetCpp.h”

using namespace std;
using namespace mxnet::cpp;

int main(int argc, char** argv)
{
const float learning_rate = 0.01;

vector<mx_float> input =
{
1.0,
3.0,
5.3,
8.0,
6
};
vector<mx_float> output =
{
3.1415,
9.4245,
16.64995,
25.132,
18.849
};

Context ctx0 = Context::cpu();

auto x0 = Symbol::Variable(“x”);
auto w0 = Symbol::Variable(“w0”);

auto stage1 = w0x0x0;
auto net = LinearRegressionOutput(“linreg”, stage1, Symbol::Variable(“label”));

NDArray daInputs = NDArray(input ,Shape(1,input.size()),ctx0);
NDArray daOutputs = NDArray(output,Shape(1,input.size()),ctx0);

std::map<string, NDArray> args0;

Optimizer* opt = OptimizerRegistry::Find(“adam”);
opt->SetParam(“lr”, learning_rate);//->SetParam(“wd”, weight_decay);

int epoch =100000;

auto arg_names = net.ListArguments();

while(epoch–)
{
for (int s=0 ; s != input.size(); s++)
{
args0[“x”] = NDArray({input[s]},Shape(1,1),ctx0);
args0[“label”] = NDArray({output[s]},Shape(1,1),ctx0);
auto exec0 = net.SimpleBind(ctx0, args0);

  exec0->Forward(true);
  exec0->Backward();

  for(int i = 0 ; i != arg_names.size(); i++)
  {
    if (arg_names[i] == "w0")
      opt->Update(i, exec0->arg_arrays[i], exec0->grad_arrays[i]);
    cout << arg_names[i]  << "=" << exec0->arg_arrays[i] << endl;
  }
  delete exec0;
}

}
return 1;

@mli in the gradient descent formula (3.1.10) aren’t you always changing all the coefficients together by the same value? shouldn’t this step be performed for each coefficient separately?