Frame level data iterator

In Mxnet, there are fewer kinds of data iterator for speech research filed.

Could anyone contribute a frame level data iterator for speech recognition?

To my best knowledge, frame level training is more common for speech field.