Audio Classification with perl API

John · August 24, 2019, 2:43pm

The MXNet forum does not allow more than 2 links for a new user. Reference the README for the code for all links.

This is intended to implement a CNN for audio classification of voice and data transmissions.

The MXNet perl API is used to classify audio files (currently 2 categories). Results so far are good. With my simple requirements and minimal test data there is 100% correct classificaiton.

The input is radio transmissions (.wav) that represent either a human speaking or a data transmission. Previously, I have been doing this classification with SoX voice detection function (much lower success rate).

Unlike Gluon Audio which uses librosa to extract MFCCs I am creating spectrograms (png image files) as input to the network. I would like to use the Gluon Audio approach however it is currently dependent on librosa which is python only. Gluon Audio mentions MXNet FFT operator on CPU as a possible future replacement for this dependency. So hopefully this can be used at some point.

Although the the use of machine learning for my requirements is probably overkill I plan on expanding the categories/capability in the future.

It would be great if this helps anyone like the examples below helped me. I am open to any feedback.

To create training data

WAV file -> extract middle second -> generate spectrogram PNG

Currently ffmpeg is used to generate spectrograms outside the training process. Training data is created via a seperate program that uses metadata from database and audio files from disk. Spectrograms are generated like so:

/usr/bin/ffmpeg -i audio.wav -lavfi showspectrumpic=s=100x50:scale=log:legend=off audio.png

The spectrograms should be placed in a folder structure as documented in ImageFolderDataset.

Dependencies

MXNet pull request against ImageFolderDataset
ffmpeg

Based on these examples

Sergey Kolychev’s mnist.pl
Sergey Kolychev’s Machine learning in Perl, Part3
Eryk Wdowiak’s MXNet in Perl

ThomasDelteil · September 1, 2019, 5:39pm

Hi @John,

Thanks for sharing your project. Spectrogram => CNN for audio classification is indeed a sound approach to audio classification and I am glad you’ve had good success with it and the perl API. Good luck with your project.

Topic		Replies	Views
Algorithm for video classification and labelling Discussion	0	472	April 1, 2018
Multi-label classification using mxnet	1	2281	June 29, 2018
How to use MXPredCreate, C++ API for MxNet Discussion	0	2050	December 18, 2017
Prediction part of python CNN MXNet code?	7	1430	July 6, 2018
Predicting image class using CGAN_mnist_R example Discussion	1	351	October 17, 2018

Audio Classification with perl API

To create training data

Dependencies

Based on these examples

Related Topics