At first, I put my eyes on the C++ code that splits a line into words. I tried STL istringstream, C strtok_r, and a string splitting trick that I learned in Google. However, these choices do not affect the efficiency saliently.
Then I realized the lesson I learned from parallel LDA (a machine learning method, which has been a key part of my research for over three years) --- map
On the other hand, I highly suspect that AWK, an interpreted language, implements a trie-based data structure for maps with string-keys.
No comments:
Post a Comment