LARS implementation is different from the paper

huyangc · November 21, 2018, 1:39am

apache/incubator-mxnet/blob/master/python/mxnet/optimizer/optimizer.py#L788-L791


def _l2norm(self, v):
    "inner product implementation"
    norm = multiply(v, v).asnumpy().sum()
    return norm

The l2norm is without sqrt and sqrt is done in the _get_lars. Is this a bug or a feature?

implementation in tensorflow is just the same as paper said.

github.com

tensorflow/tensorflow/blob/master/tensorflow/contrib/opt/python/training/lars_optimizer.py#L101-L115


def compute_lr(self, grad, var):
  scaled_lr = self._learning_rate
  if self._skip_list is None or not any(v in var.name
                                        for v in self._skip_list):
    w_norm = linalg_ops.norm(var, ord=2)
    g_norm = linalg_ops.norm(grad, ord=2)
    trust_ratio = array_ops.where(
        math_ops.greater(w_norm, 0),
        array_ops.where(
            math_ops.greater(g_norm, 0),
            (self._eeta * w_norm /
             (g_norm + self._weight_decay * w_norm + self._epsilon)), 1.0),
        1.0)
    scaled_lr = self._learning_rate * trust_ratio
  return scaled_lr

Topic		Replies	Views
C++ API. Linear regression example Discussion	3	1344	May 15, 2018
Difficulties with recurrent network Gluon	0	440	August 11, 2020
L1 regularization implementation in Gluon	0	405	March 24, 2020
Model Selection, Underfitting and Overfitting D2L Book	1	967	January 12, 2020
Gradient nan when using 2-norm in lstm network Gluon	0	393	August 16, 2019

LARS implementation is different from the paper

Related Topics