probably everyone knows that standard Adam implementation goes wrong when adding weight decay, and that AdamW is proposed for almost 2 years.
There are files such as
src/operator/contrib/adamw.cu in repo, there is
mx.optimizer.contrib.contrib.adamw_update function, but no
mxnet.optimizer.AdamW class exists.
Can I make use of AdamW in MXNet now, or it is yet to be implemented?
P.S. I use mxnet-cu102 python package .