Nov 26, 2008

Learning Materials on Non-Parametric Bayesian Methods and Topic Modeling

In the recent months, I have been learning non-parametric Bayesian
methods for topic modeling. Here follows some documents I feel
helpful in the learning process, and what I want to do using the

0. Measure Theory and sigma-algebra
http://en.wikipedia.org/wiki/Measure_theory
http://en.wikipedia.org/wiki/Sigma-algebra
Measure theory is the basis of Dirichlet process and many other
stochastic processes. Sigma-algebra is the support of measure theory.
They are keys to generalize finite latent factors to infinity.

1. Basics of Dirichlet process:
http://velblod.videolectures.net/2007/pascal/mlss07_tuebingen/teh_yee_whye/teh_yee_whye_dp_article.pdf
This introduction is a course note written by Yee Teh, the author
of hierarchical Dirichlet process (HDP).

2. Dirichlet Process Mixture Models
http://gs2040.sp.cs.cmu.edu/neal.pdf
This paper presents the Dirichlet process mixture model which is a
mixture with (potentially) infinite number of components. This paper
explains how to generalize traditional mixture models using a
Dirichlet process as the prior of components. This generalization
makes it possible to estimate the number of components using Gibbs
sampling in tractable amount of runtime complexity.

3. Hierarchical Dirichlet Process
http://www.cs.berkeley.edu/~jordan/papers/hdp.pdf
As LDA models each document using a mixture of finite number of
topics, hierarchical Dirichlet process (HDP) models each document by a
mixture of infinite number of topics, where each finite mixture is a
Dirichlet process mixture model.