Hi I see here https://github.com/jay1204/st-resnet/blob/master/model/video_iter.py an MXNet video classification model (st-resnet) where the
read_train_frames function reads from folders of pictures (
frames_name = [f for f in sorted(os.listdir(video_path)) if f.endswith('.jpg')]). I have the following question:
Is is a correct practice for video classification tasks to pre-process the data by converting videos to downsampled jpg frames? Or is the most frequent approach to have the
dataset handling reading, downsampling and extracting arrays directly from .mp4 or some other video format?