Model-based STFT phase recovery for audio source separation

Paul Magron, Roland Badeau and Bertrand David.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 26, no. 6, pp. 1095-1105, June 2018
[Paper] [Code]

On this page, we provide several audio excerpts that complete and illustrate the experiments presented in the paper.


Phase unwrapping vs Griffin Lim

In this experiment, we compare the performance of PU and GL algorithms on a piano piece extracted from the MAPS dataset ("Cataluña" by Isaac Albéniz).
The magnitudes can be either known (oracle) or corrupted (they are actually replaced by an estimate obtained with an NMF)and the phase is randomized within non-onset frames before being estimated with the algorithms.
The signals estimated with the PU algorithm may be corrupted by some artifacts: musical noise for a short window and phasiness for a long window. The GL algorithm benefit for long windows over short windows. Overall, for an intermediate analysis window (92 ms), both algorithms lead to similar results.

Original

Oracle magnitudes

Window length Random phases Grifin Lim Phase Unwrapping
12 ms
92 ms
370 ms

Corrupted magnitudes

Window length Random phases Grifin Lim Phase Unwrapping
12 ms
92 ms
370 ms

Source separation - Initialization of the iterative procedure

Here, we illustrate the interest of the PU initialization for the iterative procedure as detailed in Section IV-C of the paper. The song is "One minute smile" by Actions (from the DSD100 dataset). We mainly here the benefit of the proposed approach in the bass and drum tracks compared to the other. This listening evaluation, though informal, is consistent with the results in terms of source separation quality indicators shown in Table I.

Mixture


Bass Drum Other Vocals
Original sources
Mixture phase
Random phase
Phase unwrapping

Source separation - Comparison to other methods

We present here the results of the source separation experiment in the Oracle (song "One minute smile" by Actions) and Informed (song "Signs" by Zeno) scenarios. Both songs are extracted from the DSD100 dataset.

As in the previous experiment, the main benefits of the proposed method can be heard in the "bass" and "drum" tracks, where a significant reduction of artifacts is obtained.

Oracle scenario

Mixture


Bass Drum Other Vocals
Original sources
Wiener filtering
Consistent Wiener filtering
Proposed method

Informed scenario

Mixture


Bass Drum Other Vocals
Original sources
Wiener filtering
Consistent Wiener filtering
Proposed method