Gradient becomes nan for simple computational graph involving exp

Defining y = 1/(1+exp(x)) and using auto differentiation to calculate dy/dx can easily generate an nan, when x is large; see example below.

While I probably understand why (infinity dividing by infinity?), this is somewhat a hassle for many types of loss calculation. The derivative for the function y is well defined even for a fairly large x.

I am using mxnet 1.5.1.

****** simple code ***************

import numpy as np
import mxnet as mx
from mxnet import nd, autograd

x = nd.array([-100000., 0., 1., 100.] )
x.attach_grad()
with autograd.record():
y = 1. / (1. + nd.exp(x))
y.backward()
dx = x.grad.asnumpy()
print(dx)

output is [-0. -0.25 -0.19661194 nan]