Dec 20, 2009

Collapsed Gibbs Sampling of LDA on GPU

Thanks to Feng Yan who sent me his newly published work on parallel inference of LDA on GPU.

The basic motivation is that in the circumstances of GPU, display card memory has too small capacity to maintain a copy of nwk matrix for each core in GPU. So the very basic requirement is to keep a global nwk matrix for all cores. This brings a new requirement that when multiple cores work together in sampling, they should not update the same element of nwk simultaneously. Feng gave a solution to partition the training data by not only documents but also words. This is viable due to the observation that:
  • for word w1 in document j1 and word w2 in document j2, if w1!=w2 and j1!=j2, simultaneious updates of topic assignment have no read/write conflicts on document-topic matrix njk nor wor-topic matrix nwk.
Feng also presents a preprocess algorithm which computes an optimal data partition under the goal of load balancing.


jeremy said...

I wonder how this idea deals with the global topic count since every modification on a shared matrix has an effect on the global topic count, which would then slow down all processes to close to serial time if a lock is needed.

jeremy said...

There would be a conflict if w1's new topic equals w2 old or new topic and vise verse.