Document recommendation using data compression

Takafumi Suzuki, Shin Hasegawa, Takayuki Hamamoto, Akiko Aizawa

Research output: Contribution to journalConference articlepeer-review

2 Citations (Scopus)

Abstract

We propose a new method of content-based document recommendation using data compression. Though previous studies mainly used bags-of-words to calculate the similarity between the profile and target documents, users in fact focus on larger unit than words, when searching information from documents. In order to take this point into consideration, we propose a method of document recommendation using data compression. Experimental results using Japanese newspaper corpora showed that (a) data compression performed better than the bag-of-words method, especially when the number of topics was large; (b) our new method outperformed the previous data compression method; (c) a combination of data compression and bag-of-words can also improve performance. We conclude that our method better captures users' profiles and thus contributes to making a better document recommendation system.

Original languageEnglish
Pages (from-to)150-159
Number of pages10
JournalProcedia - Social and Behavioral Sciences
Volume27
DOIs
Publication statusPublished - 2011
EventConference on Pacific Association for Computational Linguistics, PACLING 2011 - Kuala Lumpur, Malaysia
Duration: 19 Jul 201121 Jul 2011

Keywords

  • Data compression
  • LZ78
  • PRDC
  • document classification
  • document recommendation

Fingerprint

Dive into the research topics of 'Document recommendation using data compression'. Together they form a unique fingerprint.

Cite this