GPU memory fluctuates for optimizer of Nesterov accelerated SGD?


#1

I didn’t observe this for Adam or SGD. The GPU memory was about 3.2G during initialization. After that it was 6.9G and then 6.0G and then 4.6G and so on so forth, until the out of memory reported.

/home/ubuntu/src/mxnet/dmlc-core/include/dmlc/./logging.h:308: src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fe2e107bf0c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet7storage23GPUPooledStorageManager5AllocEm+0x15e) [0x7fe2e20acbae]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x69) [0x7fe2e20b00b9]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(+0x175f895) [0x7fe2e20d4895]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7fe2e209ccb3]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x123) [0x7fe2e20a59d3]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fe2e209f13a]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fe2f7d0cc80]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fe2fd9dc6ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fe2fd7123dd]

/home/ubuntu/src/mxnet/dmlc-core/include/dmlc/./logging.h:308: src/engine/./threaded_engine.h:347: src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

The GPU memory cost is about 6G for Adam and 4.5G for SGD.

Appreciate if anybody could help.


#2

Different optimizers have different amounts of internal state (e.g. you might want to store momentum, diagonal preconditioners, etc.). This is why different optimizers need different amounts of storage. Have a look at optimizer.py in the mxnet codebase to see what’s going on.

TL;DR - if you’re really short of memory, use plain SGD.