Hi there,
I re-trained the ‘ssd_512_resnet50_v1_custom’ model on a custom dataset, and now I wanted to estimate the FPS for inference on a GeForce RTX 2080 Ti.
I am using this code:
def main():
try:
a = mx.nd.zeros((1,), ctx=mx.gpu(1))
ctx = [mx.gpu(1)]
except:
ctx = [mx.cpu()]
# -------------------------
# Load model
# -------------------------
classes = ['Guitar', 'face']
net = model_zoo.get_model('ssd_512_resnet50_v1_custom', ctx=ctx, classes=classes, pretrained_base=False)
net.load_parameters('saved_weights/test_000/ep_30.params')
# Load the webcam handler
cap = cv2.VideoCapture("video/video_01.mp4")
count_frame = 0
loading_frame_FPSs = np.zeros(844)
pre_processing_FPSs = np.zeros(844)
inference_FPSs = np.zeros(844)
total_FPSs = np.zeros(844)
while(True):
print(f"Frame: {count_frame}")
total_t_frame = 0
#######
start_t = time.time()
#######
# Load frame from the camera
ret, frame = cap.read()
#######
stop_t = time.time()
total_t_frame += (stop_t - start_t)
FPS = 1/(stop_t-start_t)
loading_frame_FPSs[count_frame] = FPS
print(f"\tloading frame time = {(stop_t-start_t)} -> FPS = {FPS}")
#######
if (cv2.waitKey(25) & 0xFF == ord('q')) or (ret == False):
cv2.destroyAllWindows()
cap.release()
break
#######
start_t = time.time()
#######
# Image pre-processing
frame = mx.nd.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)).astype('uint8')
rgb_nd, frame = gcv.data.transforms.presets.ssd.transform_test(frame, short=512, max_size=700)
#######
stop_t = time.time()
total_t_frame += (stop_t - start_t)
FPS = 1/(stop_t-start_t)
pre_processing_FPSs[count_frame] = FPS
print(f"\timage pre-processing time = {(stop_t-start_t)} -> FPS = {FPS}")
#######
#######
start_t = time.time()
#######
# Run frame through network
class_IDs, scores, bounding_boxes = net(rgb_nd)
#######
stop_t = time.time()
total_t_frame += (stop_t - start_t)
FPS = 1/(stop_t-start_t)
inference_FPSs[count_frame] = FPS
print(f"\tinference time = {(stop_t-start_t)} -> FPS = {1/(stop_t-start_t)}")
#######
print(f"\tTotal frame FPS = {1/total_t_frame}")
total_FPSs[count_frame] = 1/total_t_frame
count_frame += 1
cv2.destroyAllWindows()
cap.release()
print(f"Average FPS for:")
print(f"\tloading frame: {np.average(loading_frame_FPSs)}")
print(f"\tpre-processingg frame: {np.average(pre_processing_FPSs)}")
print(f"\tinference frame: {np.average(inference_FPSs)}")
print(f"\ttotal process: {np.average(total_FPSs)}")
So, basically I’m measuring the time required for every inference step (loading frame, resizing, inference), and calculating the FPS for each of these steps and in total.
Looking at the output
Average FPS for:
loading frame: 813.3313447171636
pre-processingg frame: 10.488629638752457
inference frame: 101.50787170217922
total process: 9.300166489874748
it seems that the bottleneck is mostly given by the pre-processing of the images.
When checking the output of nvidia-smi
I got
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:18:00.0 Off | N/A |
| 36% 63C P0 78W / 250W | 10MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:3B:00.0 Off | N/A |
| 37% 65C P2 84W / 250W | 715MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:86:00.0 Off | N/A |
| 36% 63C P0 71W / 250W | 10MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... Off | 00000000:AF:00.0 Off | N/A |
| 27% 34C P8 10W / 250W | 165MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
which I guess is reasonable, since for inference I’m using just one image at a time, so I don’t expect the GPU usage to be as high as it is during training.
At this point, however, there are a couple of things I’m not sure about:
-
when reading about the average FPS of SSD models, they’re usually mentioned to be in the range of 25-30 FPS. How do I get to those values? Is it all about image pre-processing?
-
I tried to modify the block
try: a = mx.nd.zeros((1,), ctx=mx.gpu(1)) ctx = [mx.gpu(1)]
except:
ctx = [mx.cpu()]
with simply:
ctx = mx.gpu(1)
but it seems that this way the process is running on CPU (not even those 715 MB are occupied on GPU). Why is that?