Online spectrogram inversion for low-latency audio source separation

Paul Magron and Tuomas Virtanen

IEEE Signal Processing Letters, Vol. 27, pp. 306-310, 2020

On this page, we provide several audio excerpts that illustrate the speech separation experiments presented in the paper. The excerpts are extracted from the Danish HINT dataset and the Wall street journal dataset.

The difference is mostly noticeable in the Oracle setting (where the magnitude spectra are known). We can hear that the amplitude mask yields estimates with interference, which highlights the need for advanced phase recovery. The MISI algorithm yields estimates with very few interference, even though it operates offline.
The proposed oMISI algorithm (used here with one future frame) leads to comparable sounding quality, with the advantage of operating online (in this case the latency is 24 ms).