Task estimation using latent semantic analysis of visual scenes and spoken words

Masashi Kimura, Shinta Sawada, Yurie Iribe, Kouichi Katsurada, Tsuneo Nitta

Research output: Contribution to journalArticle


In this paper, we propose a task estimation method based on multiple subspaces extracted from multi-modal information of image objects in visual scenes and spoken words in dialog appeared in the same task. The multiple subspaces are obtained by using latent semantic analysis (LSA). In the proposed method, a task vector composed of spoken words and the frequencies of image-object appearances are extracted first, and then similarities among the input task vector and reference sub-spaces of different tasks are compared. Experiments are conducted on the identification of game tasks. Experimental results show that the proposed method with multi-modal information outperforms the method in which only single modality of image or spoken dialog is applied. Moreover, the proposed method achieved accurate performance even if less spoken dialog is applied.

Original languageEnglish
Pages (from-to)1473-1480+13
JournalIEEJ Transactions on Electronics, Information and Systems
Issue number9
Publication statusPublished - 2012



  • Latent Semantic Analysis
  • Multi-modal Processing
  • Task Estimation

Cite this