Online spectrogram inversion for low-latency audio source separation

Paul Magron and Tuomas Virtanen
IEEE Signal Processing Letters, Vol. 27, pp. 306-310, 2020
[Paper] [Code]

On this page, we provide several audio excerpts that illustrate the speech separation experiments presented in the paper. The excerpts are extracted from the Danish HINT dataset and the Wall street journal dataset.

The difference is mostly noticeable in the Oracle setting (where the magnitude spectra are known). We can hear that the amplitude mask yields estimates with interference, which highlights the need for advanced phase recovery. The MISI algorithm yields estimates with very few interference, even though it operates offline.
The proposed oMISI algorithm (used here with one future frame) leads to comparable sounding quality, with the advantage of operating online (in this case the latency is 24 ms).


HINT: Male + Female

Mixture:  Male (clean):  Female (clean): 


Male Female

Estim. Oracle Estim. Oracle
Amplitude mask
MISI
oMISI-mix
oMISI-sin

HINT: Male + Male

Mixture:  Male 1 (clean):  Male 2 (clean): 


Male 1
Male 2

Estim. Oracle Estim. Oracle
Amplitude mask
MISI
oMISI-mix
oMISI-sin

HINT: Female + Female

Mixture:  Female 1 (clean):  Female 2 (clean): 


Female 1
Female 2

Estim. Oracle Estim. Oracle
Amplitude mask
MISI
oMISI-mix
oMISI-sin

WSJ: Male + Female

Mixture:  Male (clean):  Female (clean): 


Male Female

Estim. Oracle Estim. Oracle
Amplitude mask
MISI
oMISI-mix
oMISI-sin

WSJ: Male + Male

Mixture:  Male 1 (clean):  Male 2 (clean): 


Male 1
Male 2

Estim. Oracle Estim. Oracle
Amplitude mask
MISI
oMISI-mix
oMISI-sin

WSJ: Female + Female

Mixture:  Female 1 (clean):  Female 2 (clean): 


Female 1
Female 2

Estim. Oracle Estim. Oracle
Amplitude mask
MISI
oMISI-mix
oMISI-sin