OOM when trying to slice and print nd array

Hello,

I am trying to use predict functionality.

When i try to slide the predictions and convert to numpy, it gives a OOM error.

for preds, i_batch, batch in pred_model.iter_predict(test_data_iter, num_batch=3):
    print(len(preds))
    print(preds[0].shape)
    print(preds[0].context)
    a = preds[0][1][:10]
    y = a.as_in_context(mx.cpu())
    print(y)

Output:

1 

(1000, 2997129)

 gpu(0)


---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
<timed exec> in <module>()

~/mxnet/incubator-mxnet/python/mxnet/ndarray/ndarray.py in wait_to_read(self)
   1714         0.0893700122833252
   1715         """
-> 1716         check_call(_LIB.MXNDArrayWaitToRead(self.handle))
   1717 
   1718     @property

~/mxnet/incubator-mxnet/python/mxnet/base.py in check_call(ret)
    147     """
    148     if ret != 0:
--> 149         raise MXNetError(py_str(_LIB.MXGetLastError()))
    150 
    151 

MXNetError: [23:54:07] src/operator/tensor/./../mxnet_op.h:576: Check failed: err == cudaSuccess (2 vs. 0) Name: mxnet_generic_kernel ErrStr:out of memory

Stack trace returned 10 entries:
[bt] (0) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace()+0x4a) [0x7f5c82d3377a]
[bt] (1) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x21) [0x7f5c82d33d81]
[bt] (2) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::op::mxnet_op::Kernel<mxnet::op::mxnet_op::set_to_int<0>, mshadow::gpu>::Launch<int*>(mshadow::Stream<mshadow::gpu>*, int, int*)+0x16d) [0x7f5c85ce724d]
[bt] (3) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::op::SparseEmbeddingOpForwardRspImpl<mshadow::gpu>(mxnet::OpContext const&, mxnet::TBlob const&, mxnet::NDArray const&, mxnet::OpReqType, mxnet::TBlob const&)+0x1907) [0x7f5c868823c7]
[bt] (4) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::op::SparseEmbeddingOpForwardEx<mshadow::gpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x82d) [0x7f5c86aafdcd]
[bt] (5) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x366a766) [0x7f5c85761766]
[bt] (6) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x589) [0x7f5c856ea369]
[bt] (7) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0xeb) [0x7f5c856faf7b]
[bt] (8) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#3}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0x46) [0x7f5c856fb1c6]
[bt] (9) /home/ec2-user/mxnet/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x44) [0x7f5c856f7dd4]

maybe related to https://github.com/apache/incubator-mxnet/pull/11742