TY - GEN
T1 - Speech Synthesis from IPA Sequences through EMA Data
AU - Maruyama, Koki
AU - Sawada, Shun
AU - Ohmura, Hidefumi
AU - Katsurada, Kouichi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Previous research on articulatory synthesis has partially implemented and verified the speech synthesis systems that mimic the human vocalization process. This study proposes a speech synthesis system that more faithfully replicates the human vocalization process by constructing a model that generates speeches from the international phonetic alphabet (IPA) sequence through articulatory movement data. For training and evaluating this model, we used the ATR phoneme-balanced 503 sentence electromagnetic articulography (EMA) database that contains pair data of speech and EMA data of a male Japanese speaker. The experimental results showed that using articulatory movement data as intermediate information improved the quality of synthesized speeches regarding objective scores such as melcepstral distortion (MCD), phoneme error rate (PER), and perceptual evaluation of speech quality (PESQ). In particular, the effectiveness of using articulatory movement data was confirmed in the case of speech where articulation points were included in the EMA database.
AB - Previous research on articulatory synthesis has partially implemented and verified the speech synthesis systems that mimic the human vocalization process. This study proposes a speech synthesis system that more faithfully replicates the human vocalization process by constructing a model that generates speeches from the international phonetic alphabet (IPA) sequence through articulatory movement data. For training and evaluating this model, we used the ATR phoneme-balanced 503 sentence electromagnetic articulography (EMA) database that contains pair data of speech and EMA data of a male Japanese speaker. The experimental results showed that using articulatory movement data as intermediate information improved the quality of synthesized speeches regarding objective scores such as melcepstral distortion (MCD), phoneme error rate (PER), and perceptual evaluation of speech quality (PESQ). In particular, the effectiveness of using articulatory movement data was confirmed in the case of speech where articulation points were included in the EMA database.
UR - http://www.scopus.com/inward/record.url?scp=85218195052&partnerID=8YFLogxK
U2 - 10.1109/APSIPAASC63619.2025.10849075
DO - 10.1109/APSIPAASC63619.2025.10849075
M3 - Conference contribution
AN - SCOPUS:85218195052
T3 - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
BT - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Y2 - 3 December 2024 through 6 December 2024
ER -