Speech source discrimination method for plural voice user interfaces environment

Tatsuhiko Imaizumi, Takahiro Yoshida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Voice user interface (VUI), which enables users to control devices such as smart speakers and smartphones by voice, is becoming more popular. However, in an environment where there are plural VUI devices nearby, VUI devices have a problem of mis-respond to playback voices from other devices, such as another VUI device's response voices, narration voices, handsfree telephone voices, and so on. Therefore, in this study, we proposed a speech source discrimination method for plural VUIs environment using convolutional neural network (CNN). In this method, we used mel-frequency cepstrum coefficients (MFCC) features, that is used especially for speech signal processing, as the input features for the CNN. From the experimental results, it was confirmed that the proposed method can discriminate the speech source of voices of the same speakers and sentences to the training data between four kinds of the speech sources, i.e. direct voice, playback voice, and synthesized voice, with 97.5 % accuracy. In addition, we improved the discrimination accuracy of the speech source of different speakers and sentences with the method of speech data split.

Original languageEnglish
Title of host publication2019 IEEE 8th Global Conference on Consumer Electronics, GCCE 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1081-1083
Number of pages3
ISBN (Electronic)9781728135755
DOIs
Publication statusPublished - Oct 2019
Event8th IEEE Global Conference on Consumer Electronics, GCCE 2019 - Osaka, Japan
Duration: 15 Oct 201918 Oct 2019

Publication series

Name2019 IEEE 8th Global Conference on Consumer Electronics, GCCE 2019

Conference

Conference8th IEEE Global Conference on Consumer Electronics, GCCE 2019
CountryJapan
CityOsaka
Period15/10/1918/10/19

Keywords

  • Convolutional neural network
  • Mel-frequency cepstrum coefficient
  • Speech source discrimination
  • Voice user interface

Fingerprint Dive into the research topics of 'Speech source discrimination method for plural voice user interfaces environment'. Together they form a unique fingerprint.

  • Cite this

    Imaizumi, T., & Yoshida, T. (2019). Speech source discrimination method for plural voice user interfaces environment. In 2019 IEEE 8th Global Conference on Consumer Electronics, GCCE 2019 (pp. 1081-1083). [9015607] (2019 IEEE 8th Global Conference on Consumer Electronics, GCCE 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/GCCE46687.2019.9015607