Mar 26, 2010

Data-Intensive Text Processing with MapReduce

A book draft, Data-Intensive Text Processing with MapReduce, on parallel text algorithms with MapReduce can be found here. This book has chapters covering graph algorithms (breath-first traversal and PageRank) and learning HMM using EM. The authors work great on presenting concepts using figures, which are comprehensive and intuitive.

Indeed, there are many other interesting stuff you can put into a book on MapReducing text processing algorithms. For example, parallel latent topic models like latent Dirichlet allocation, and tree pruning/learning algorithms for various purposes.

No comments: