Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers

Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono,


Evaluation on Speech Signals :

This section shows examples of the reconstructed speech signals. Original speech signals are from a subset of the TIMIT dataset [1] and resampled at 16 kHz. For more details, please refer our paper.

Speech 1 (ground-truth)


PG-GLA [2] ADMM-GLA [3] iPALM-Joint [4] ADMM-Joint (Proposed)
100 iterations
500 iterations

Speech 2 (ground-truth)


PG-GLA [2] ADMM-GLA [3] iPALM-Joint [4] ADMM-Joint (Proposed)
100 iterations
500 iterations


Evaluation on Music and Environmental Signals :

This section shows examples of foley sounds reconstructed with 500 iterations. Original foley sounds are from the development sets of DCASE2023 Task 7 [5] and sampled at 22.05 kHz. For more details, please refer our paper.

Original iPALM-Joint [4] ADMM-Joint (Proposed)
DogBark
Footstep


References:

[1] P. Mowlaee, J. Kulmer, J. Stahl, and F. Mayer, “Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice,” Wiley, 2016. [page]
[2] D. Griffin and J. Lim, “Signal Estimation from Modified Short-Time Fourier Transform,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236-243, Apr. 1984. [paper]
[3] Y. Masuyama, K. Yatabe, and Y. Oikawa, “Griffin-Lim Like Phase Recovery via Alternating Direction Method of Multipliers,” IEEE Signal Process. Lett., vol. 26, pp. 184-188, Jan. 2019. [paper]
[4] Y. Masuyama, N. Ueno, and N. Ono, “Signal Reconstruction from Mel-Spectrogram Based on Bi-Level Consistency of Full-Band Magnitude and Phase,” IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. 2023. [paper]
[5] K. Choi, J. Im, L. Heller, B. McFee, K. Imoto, Y. Okamoto, M. Lagrange, and S. Takamichi, “Foley sound synthesis at the dcase 2023 challenge,” 2023. [page]