TY - JOUR
T1 - Document recommendation using data compression
AU - Suzuki, Takafumi
AU - Hasegawa, Shin
AU - Hamamoto, Takayuki
AU - Aizawa, Akiko
N1 - Funding Information:
We were supported by Grant-in-Aid for Scientific Research 23700288 for Young Scientists (B) from the Ministry of Education, Culture, Sports, Sciences and Technology, Japan, and Grant for joint research from National Institute of Informatics. Earlier versions of this study were presented at 2009th and 2010th Annual meetings of The Institute of Electronics, Information and Communication Engineers, and 2009th Annual conference of The Japan Society for Artificial Intelligence. We would like to thank the participants of these meetings for their helpful comments.
PY - 2011
Y1 - 2011
N2 - We propose a new method of content-based document recommendation using data compression. Though previous studies mainly used bags-of-words to calculate the similarity between the profile and target documents, users in fact focus on larger unit than words, when searching information from documents. In order to take this point into consideration, we propose a method of document recommendation using data compression. Experimental results using Japanese newspaper corpora showed that (a) data compression performed better than the bag-of-words method, especially when the number of topics was large; (b) our new method outperformed the previous data compression method; (c) a combination of data compression and bag-of-words can also improve performance. We conclude that our method better captures users' profiles and thus contributes to making a better document recommendation system.
AB - We propose a new method of content-based document recommendation using data compression. Though previous studies mainly used bags-of-words to calculate the similarity between the profile and target documents, users in fact focus on larger unit than words, when searching information from documents. In order to take this point into consideration, we propose a method of document recommendation using data compression. Experimental results using Japanese newspaper corpora showed that (a) data compression performed better than the bag-of-words method, especially when the number of topics was large; (b) our new method outperformed the previous data compression method; (c) a combination of data compression and bag-of-words can also improve performance. We conclude that our method better captures users' profiles and thus contributes to making a better document recommendation system.
KW - Data compression
KW - LZ78
KW - PRDC
KW - document classification
KW - document recommendation
UR - http://www.scopus.com/inward/record.url?scp=83755162445&partnerID=8YFLogxK
U2 - 10.1016/j.sbspro.2011.10.593
DO - 10.1016/j.sbspro.2011.10.593
M3 - Conference article
AN - SCOPUS:83755162445
SN - 1877-0428
VL - 27
SP - 150
EP - 159
JO - Procedia - Social and Behavioral Sciences
JF - Procedia - Social and Behavioral Sciences
T2 - Conference on Pacific Association for Computational Linguistics, PACLING 2011
Y2 - 19 July 2011 through 21 July 2011
ER -