Speech gap inpainting with generation adversarial network

WANG Jie, GUAN Yuansheng, HU Wenlin

PDF(2172 KB)
Welcome to visit Journal of Shaanxi Normal University(Natural Science Edition)!
Journal of Shaanxi Normal University(Natural Science Edition) ›› 2022, Vol. 50 ›› Issue (6) : 39-48.

Speech gap inpainting with generation adversarial network

  • WANG Jie1,2,GUAN Yuansheng1,HU Wenlin2*
Author information +
History +

Abstract

In order to solve problems in audio inpainting, such as the short length of repairable segment, limited object to music audio with high repeatability, and inverse transformation distortion caused by using spectrogram, a new generation adversarial network for long speech inpainting is proposed. The new network takes the original speech signals as input and output, which solves the limitations of the model based on spectrogram. Firstly, it is proposed to use a context codec as a generator to improve the utilization rate of available content around the signal time-domain gap; secondly, a speech feature extraction module is added to the discriminator to effectively improve the training efficiency and generation quality by learning the pitch and phoneme features in the content before and after. Compared with several algorithms, the objective and subjective evaluation results show that our new generation adversarial network proposed in this paper has outstanding speech inpainting performance, and the generation gap length can reach 256 ms. Furthermore, the speech gap of up to 500 ms can be repaired stably for the new extended speech model by varying the audio length.

Key words

audio inpainting / generation adversarial network / context codec / speech feature extraction

Cite this article

Download Citations
WANG Jie, GUAN Yuansheng, HU Wenlin. Speech gap inpainting with generation adversarial network. Journal of Shaanxi Normal University(Natural Science Edition). 2022, 50(6): 39-48

References

PDF(2172 KB)

124

Accesses

0

Citation

Detail

Sections
Recommended

/