Speech Synthesis from IPA Sequences through EMA Data

Koki Maruyama, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada

研究成果: Conference contribution査読

抄録

Previous research on articulatory synthesis has partially implemented and verified the speech synthesis systems that mimic the human vocalization process. This study proposes a speech synthesis system that more faithfully replicates the human vocalization process by constructing a model that generates speeches from the international phonetic alphabet (IPA) sequence through articulatory movement data. For training and evaluating this model, we used the ATR phoneme-balanced 503 sentence electromagnetic articulography (EMA) database that contains pair data of speech and EMA data of a male Japanese speaker. The experimental results showed that using articulatory movement data as intermediate information improved the quality of synthesized speeches regarding objective scores such as melcepstral distortion (MCD), phoneme error rate (PER), and perceptual evaluation of speech quality (PESQ). In particular, the effectiveness of using articulatory movement data was confirmed in the case of speech where articulation points were included in the EMA database.

本文言語English
ホスト出版物のタイトルAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
出版社Institute of Electrical and Electronics Engineers Inc.
ISBN(電子版)9798350367331
DOI
出版ステータスPublished - 2024
イベント2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, China
継続期間: 3 12月 20246 12月 2024

出版物シリーズ

名前APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

Conference

Conference2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
国/地域China
CityMacau
Period3/12/246/12/24

フィンガープリント

「Speech Synthesis from IPA Sequences through EMA Data」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル