Nov 27, 2008

How to Label/Tag Documents?

I am interesting with labelling an arbitrary document with predefined or learned labels. I prefer using LDA with this problem, because we have a highly scalable implementation of LDA, and LDA can explain each document by topics. Nevertheless, I am doing a survey before programming. I have read the following papers. Would anyone please provide any hint?
  • Simon Lacoste-Jullien, Fei Sha, and Michael I. Jordan. DiscLDA: Discriminative learning for dimensionality reduction and classification. NIPS 2008.
  • D. Blei and M. Jordan. Modeling Annotated Data. SIGIR 2003.
  • Chemudugunta, C., Holloway, A., Smyth, P., & Steyvers, M. Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning. In: 7th International Semantic Web Conference, 2008.
  • Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections. WWW 2008
  • Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai. Automatic Labeling of Multinomial Topic Models, KDD 2007.
  • Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, ChengXiang Zhai. Semantic Annotation of Frequent Patterns, ACM TKDD, 1(3), 2007.

5 comments:

theneo said...

I'm doing similar job in the lab. I tried multi-vocabulary LDA for the task, the result is better than search-based collaborative filtering, but still have much room for improvement.

theneo said...

blog里面猛料很多啊,待我仔细拜读一遍,哈哈

Yi Wang said...

感谢知远推荐两篇:

Chemudugunta, C., Holloway, A., Smyth, P., & Steyvers, M. (2008). Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning. In: 7th International Semantic Web Conference.

Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections
WWW 2008

Yi Wang said...

知远再推荐两篇:

Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai. Automatic Labeling of Multinomial Topic Models, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD' 07),

Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, ChengXiang Zhai. Semantic Annotation of Frequent Patterns, ACM Transactions on Knowledge Discovery from Data (TKDD), 1(3), 2007.

必须认真拜读之后再写notes了。:-)

子不语 said...

期待期待。:)