What is the correct way to get the gradient calculation time inside the training loop?
If I do this,
begin=time.time() self.forward_backward(data_batch) time_spent = time.time() - begin
The time_spent value is very small. I think the calculation is carried out asynchronously out side the python code.
I also tried to access the gradient to force the calculation in the following way. But I’m not sure it is correct. And I think there is GPU memory leak, because I run out of GPU memory even for small batch size
begin=time.time() self.forward_backward(data_batch) # try to access gradients so the real calculations are going to be executed for index, grad_list in enumerate(self._exec_group.grad_arrays): if len(grad_list) > 0: grad_np = grad_list.asnumpy() time_spent = time.time() - begin
Thanks for any help!