The basic motivation is that in the circumstances of GPU, display card memory has too small capacity to maintain a copy of nwk matrix for each core in GPU. So the very basic requirement is to keep a global nwk matrix for all cores. This brings a new requirement that when multiple cores work together in sampling, they should not update the same element of nwk simultaneously. Feng gave a solution to partition the training data by not only documents but also words. This is viable due to the observation that:
- for word w1 in document j1 and word w2 in document j2, if w1!=w2 and j1!=j2, simultaneious updates of topic assignment have no read/write conflicts on document-topic matrix njk nor wor-topic matrix nwk.
2 comments:
I wonder how this idea deals with the global topic count since every modification on a shared matrix has an effect on the global topic count, which would then slow down all processes to close to serial time if a lock is needed.
There would be a conflict if w1's new topic equals w2 old or new topic and vise verse.
Post a Comment