Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono,
Evaluation on Speech Signals :
This section shows examples of the reconstructed speech signals. Original speech signals are from a subset of the TIMIT dataset [1] and resampled at 16 kHz. For more details, please refer our paper.
PG-GLA [2] | ADMM-GLA [3] | iPALM-Joint [4] | ADMM-Joint (Proposed) | |
100 iterations | ||||
500 iterations |
PG-GLA [2] | ADMM-GLA [3] | iPALM-Joint [4] | ADMM-Joint (Proposed) | |
100 iterations | ||||
500 iterations |
Evaluation on Music and Environmental Signals :
This section shows examples of foley sounds reconstructed with 500 iterations. Original foley sounds are from the development sets of DCASE2023 Task 7 [5] and sampled at 22.05 kHz. For more details, please refer our paper.
Original | iPALM-Joint [4] | ADMM-Joint (Proposed) | |
DogBark | |||
Footstep |
References:
[1] P. Mowlaee, J. Kulmer, J. Stahl, and F. Mayer, “Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice,” Wiley, 2016.
[page]
[2] D. Griffin and J. Lim, “Signal Estimation from Modified Short-Time Fourier Transform,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 2, pp. 236-243, Apr. 1984.
[paper]
[3] Y. Masuyama, K. Yatabe, and Y. Oikawa, “Griffin-Lim Like Phase Recovery via Alternating Direction Method of Multipliers,” IEEE Signal Process. Lett., vol. 26, pp. 184-188, Jan. 2019.
[paper]
[4] Y. Masuyama, N. Ueno, and N. Ono, “Signal Reconstruction from Mel-Spectrogram Based on Bi-Level Consistency of Full-Band Magnitude and Phase,” IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. 2023.
[paper]
[5] K. Choi, J. Im, L. Heller, B. McFee, K. Imoto, Y. Okamoto, M. Lagrange, and S. Takamichi, “Foley sound synthesis at the dcase 2023 challenge,” 2023.
[page]