<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6983561823392851671</id><updated>2012-01-11T17:39:04.823-08:00</updated><title type='text'>Tech Notes of Yi Wang</title><subtitle type='html'>I write my learning notes, record my ideas, and document my works here, so I can search them conveniently using Google's search technology.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default?start-index=101&amp;max-results=100'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>135</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5185996645719363562</id><published>2011-07-24T16:47:00.000-07:00</published><updated>2011-07-24T16:49:33.639-07:00</updated><title type='text'>在Web页面里渲染Dot图形（Make Apache a GraphiViz Server）</title><content type='html'>&lt;a href="http://cxwangyi.wordpress.com/2011/07/24/%E5%9C%A8web%E9%A1%B5%E9%9D%A2%E9%87%8C%E6%B8%B2%E6%9F%93graphiviz%E5%9B%BE%E5%BD%A2/"&gt;&lt;span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0);  font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;  font-family:Times;font-size:medium;"  &gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51);   text-align: left; font-family:'Lucida Grande', Verdana, Arial, sans-serif;font-size:10px;"  &gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;h2 style="font-family: 'Trebuchet MS', 'Lucida Grande', Verdana, Arial, sans-serif; font-weight: bold; font-size: 1.8em; color: rgb(51, 51, 51); text-decoration: none; margin-top: 30px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "&gt;&lt;a rel="bookmark" title="Permanent Link: 在Web页面里渲染Dot图形（GraphiViz Server）" style="color: rgb(51, 51, 51); text-decoration: none; "&gt;在Web页面里渲染Dot图形（Make Apache a GraphiViz Server）&lt;/a&gt;&lt;/h2&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5185996645719363562?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5185996645719363562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5185996645719363562' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5185996645719363562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5185996645719363562'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2011/07/webdotgraphiviz-server.html' title='在Web页面里渲染Dot图形（Make Apache a GraphiViz Server）'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-199806376550909902</id><published>2011-03-25T10:23:00.000-07:00</published><updated>2011-03-25T10:24:09.174-07:00</updated><title type='text'>Gibbs Sampling of Latent Dirichlet Allocation Model in Go</title><content type='html'>As my first exercise on Google’s Go programming language, I wrote a Gibbs sampling training program of LDA using Go. This open source project is hosted on: http://code.google.com/p/lda-go, as a sister project of my previous LDA training and inference project in C++: http://code.google.com/p/ompi-lda/.&lt;br /&gt;&lt;br /&gt;作为我学习Go语言的第一个练习，我用Go重写了LDA的Gibbs sampling训练程序：http://code.google.com/p/lda-go/。这个程序就作为我之前用C++写的并行LDA训练和推演程序（http://code.google.com/p/ompi-lda/）的姐妹项目吧。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-199806376550909902?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/199806376550909902/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=199806376550909902' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/199806376550909902'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/199806376550909902'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2011/03/gibbs-sampling-of-latent-dirichlet.html' title='Gibbs Sampling of Latent Dirichlet Allocation Model in Go'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7873809592413711705</id><published>2010-08-17T17:33:00.001-07:00</published><updated>2010-08-17T17:43:49.109-07:00</updated><title type='text'>Parallel LDA Gibbs Sampling Using OpenMP and MPI</title><content type='html'>I created an open source project, &lt;a href="http://code.google.com/p/ompi-lda/" mce_href="http://code.google.com/p/ompi-lda/"&gt;ompi-lda&lt;/a&gt;, on Google Code.  This project is inspired by another &lt;a href="http://code.google.com/p/ompi-lda/" mce_href="http://code.google.com/p/ompi-lda/"&gt;LDA project&lt;/a&gt; which I initialized at Google and a recent parallel programming effort using OpenMP by &lt;a href="http://xlvector.net/blog/?p=579&amp;amp;utm_source=feedburner&amp;amp;utm_medium=feed&amp;amp;utm_campaign=Feed%3A+blogspot%2FSHpi+%28xlvector+-+Recommender+System%29&amp;amp;utm_content=Google+Reader" mce_href="http://xlvector.net/blog/?p=579&amp;amp;utm_source=feedburner&amp;amp;utm_medium=feed&amp;amp;utm_campaign=Feed%3A+blogspot%2FSHpi+%28xlvector+-+Recommender+System%29&amp;amp;utm_content=Google+Reader"&gt;xlvector&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7873809592413711705?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7873809592413711705/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7873809592413711705' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7873809592413711705'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7873809592413711705'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/08/parallel-lda-gibbs-sampling-using.html' title='Parallel LDA Gibbs Sampling Using OpenMP and MPI'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6091353863942528042</id><published>2010-06-28T19:50:00.000-07:00</published><updated>2010-06-28T19:53:52.231-07:00</updated><title type='text'>I Moved My Blog to Wordpress</title><content type='html'>Since I found it is convenient to insert code and LaTeX math into my posts using Wordpress, I moved this blog to&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://cxwangyi.wordpress.com/"&gt;http://cxwangyi.wordpress.com/&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;I may no longer update this blog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6091353863942528042?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6091353863942528042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6091353863942528042' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6091353863942528042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6091353863942528042'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/06/i-moved-my-blog-to-wordpress.html' title='I Moved My Blog to Wordpress'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1192493741782156782</id><published>2010-06-21T02:03:00.000-07:00</published><updated>2010-06-21T02:52:43.613-07:00</updated><title type='text'>Google Protocol Buffers 实用技术：解析.proto文件和任意数据文件</title><content type='html'>Google Protocol Buffers 是一种非常方便高效的数据编码方式（data serialization），几乎在Google的每个产品中都用到了。本文介绍 protocol buffers 的一种高级使用方法（在Google Protocol Buffer的主页上没有的）。&lt;br /&gt;&lt;br /&gt;Protocol Buffers 通常的使用方式如下：我们的程序只依赖于有限的几个 protocol messages。我们把这几个 message 定义在一个或者多个 .proto 文件里，然后用编译器 protoc 把 .proto 文件翻译成 C++ 语言（.h和.cc文件）。这个翻译过程把每个 message 翻译成了一个 C++ class。这样我们的程序就可以使用这些 protocol classes 了。&lt;br /&gt;&lt;br /&gt;但是还有一种不那么常见的使用方式：我们有一批数据文件（或者网络数据流），其中包含了一些 protocol messages 内容。我们也有定义这些 protocol messages 的 .proto 文件。我们希望解析数据文件中的内容，但是不能使用 protoc 编译器。&lt;br /&gt;&lt;br /&gt;一个例子是 codex。codex是Google内最常用的一个工具程序。它可以解析任意文件中的 protocol message 内容，并且把这些内容打印成人能方便的阅读的格式。为了能正确解析和打印数据文件内容，codex 需要定义 protocol message 的 .proto 文件。&lt;br /&gt;&lt;br /&gt;为了实现 codex 的功能，一种ad hoc的方法是：&lt;br /&gt;1 把 codex 的基本功能（比如读取数据文件，打印文件内容等）实现在一个 .cc 文件里（比如叫做 codex-base.cc)&lt;br /&gt;  1 对给定的 .proto 文件，调用 protoc，得到对应的 .pb.h 和 .pb.cc 文件。&lt;br /&gt;1 把 codex-base.cc 和 protoc 的结果一起编译，生成一个专门解析某一个 protocol message 内容的 codex 程序。&lt;br /&gt;这个办法太ad hoc了。它为每个 .protoc 文件编写一个 codex 程序。&lt;br /&gt;&lt;br /&gt;另一个办法是，如果我们有世界上所有的 .proto 文件，那么我们把它们都预先用 protoc 编译了，链接进 codex。显然这个搞法也是不现实的。&lt;br /&gt;&lt;br /&gt;那么codex到底是怎么实现的呢？其实它利用了 protocol buffers 没有写入文档的一些 API。闲话少说，我们来看一段用这些“神秘的API“写的代码。这段代码用于解析任意给定的 .proto 文件：&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;google/protobuf/descriptor.h&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;google/protobuf/dynamic_message.h&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;google/protobuf/io/zero_copy_stream_impl.h&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;google/protobuf/io/tokenizer.h&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;google/protobuf/compiler/parser.h&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="comment-delimiter"&gt;//&lt;/span&gt;&lt;span class="comment"&gt;-----------------------------------------------------------------------------&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;Parsing given .proto file for Descriptor of the given message (by&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;name).  The returned message descriptor can be used with a&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;DynamicMessageFactory in order to create prototype message and&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;mutable messages.  For example:&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;/*&lt;/span&gt;&lt;span class="comment"&gt;&lt;br /&gt;DynamicMessageFactory factory;&lt;br /&gt;const Message* prototype_msg = factory.GetPrototype(message_descriptor);&lt;br /&gt;const Message* mutable_msg = prototype_msg-&amp;gt;New();&lt;br /&gt;*/&lt;/span&gt;&lt;br /&gt;&lt;span class="comment-delimiter"&gt;//&lt;/span&gt;&lt;span class="comment"&gt;-----------------------------------------------------------------------------&lt;br /&gt;&lt;/span&gt;&lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;GetMessageTypeFromProtoFile&lt;/span&gt;(&lt;span class="keyword"&gt;const&lt;/span&gt; &lt;span class="type"&gt;string&lt;/span&gt;&amp;amp; &lt;span class="variable-name"&gt;proto_filename&lt;/span&gt;,&lt;br /&gt;                              FileDescriptorProto* file_desc_proto) {&lt;br /&gt;&lt;span class="keyword"&gt;using&lt;/span&gt; &lt;span class="keyword"&gt;namespace&lt;/span&gt; &lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;using&lt;/span&gt; &lt;span class="keyword"&gt;namespace&lt;/span&gt; &lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;::&lt;span class="constant"&gt;io&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;using&lt;/span&gt; &lt;span class="keyword"&gt;namespace&lt;/span&gt; &lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;::&lt;span class="constant"&gt;compiler&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;&lt;span class="type"&gt;FILE&lt;/span&gt;* &lt;span class="variable-name"&gt;proto_file&lt;/span&gt; = fopen(proto_filename.c_str(), &lt;span class="string"&gt;"r"&lt;/span&gt;);&lt;br /&gt;{&lt;br /&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; (proto_file == &lt;span class="constant"&gt;NULL&lt;/span&gt;) {&lt;br /&gt;   LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Cannot open .proto file: "&lt;/span&gt; &amp;lt;&amp;lt; proto_filename;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; &lt;span class="type"&gt;FileInputStream&lt;/span&gt; &lt;span class="variable-name"&gt;proto_input_stream&lt;/span&gt;(&lt;span class="type"&gt;fileno&lt;/span&gt;(&lt;span class="variable-name"&gt;proto_file&lt;/span&gt;));&lt;br /&gt; &lt;span class="type"&gt;Tokenizer&lt;/span&gt; &lt;span class="variable-name"&gt;tokenizer&lt;/span&gt;(&amp;amp;proto_input_stream, &lt;span class="constant"&gt;NULL&lt;/span&gt;);&lt;br /&gt; &lt;span class="type"&gt;Parser&lt;/span&gt; &lt;span class="variable-name"&gt;parser&lt;/span&gt;;&lt;br /&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; (&lt;span class="negation-char"&gt;!&lt;/span&gt;parser.Parse(&amp;amp;tokenizer, file_desc_proto)) {&lt;br /&gt;   LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Cannot parse .proto file:"&lt;/span&gt; &amp;lt;&amp;lt; proto_filename;&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;fclose(proto_file);&lt;br /&gt;&lt;br /&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;Here we walk around a bug in protocol buffers that&lt;br /&gt;&lt;/span&gt;  &lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;|Parser::Parse| does not set name (.proto filename) in&lt;br /&gt;&lt;/span&gt;  &lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;file_desc_proto.&lt;br /&gt;&lt;/span&gt;  &lt;span class="keyword"&gt;if&lt;/span&gt; (&lt;span class="negation-char"&gt;!&lt;/span&gt;file_desc_proto-&amp;gt;has_name()) {&lt;br /&gt; file_desc_proto-&amp;gt;set_name(proto_filename);&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;这个函数的输入是一个 .proto 文件的文件名。输出是一个 FileDescriptorProto 对象。这个对象里存储着对 .proto 文件解析之后的结果。我们接下来用这些结果动态生成某个 protocol message 的 instance（或者用C++术语叫做object）。然后可以调用这个 instance 自己的 ParseFromArray/String 成员函数，来解析数据文件中的每一条记录的内容。请看如下代码：&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;span class="comment-delimiter"&gt;//&lt;/span&gt;&lt;span class="comment"&gt;-----------------------------------------------------------------------------&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;Print contents of a record file with following format:&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;//&lt;/span&gt;&lt;span class="comment"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;//   &lt;/span&gt;&lt;span class="comment"&gt;{ &amp;lt;int record_size&amp;gt; &amp;lt;KeyValuePair&amp;gt; }&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;//&lt;/span&gt;&lt;span class="comment"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;where KeyValuePair is a proto message defined in mpimr.proto, and&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;consists of two string fields: key and value, where key will be&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;printed as a text string, and value will be parsed into a proto&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;message given as |message_descriptor|.&lt;br /&gt;&lt;/span&gt;&lt;span class="comment-delimiter"&gt;//&lt;/span&gt;&lt;span class="comment"&gt;-----------------------------------------------------------------------------&lt;br /&gt;&lt;/span&gt;&lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;PrintDataFile&lt;/span&gt;(&lt;span class="keyword"&gt;const&lt;/span&gt; &lt;span class="type"&gt;string&lt;/span&gt;&amp;amp; &lt;span class="variable-name"&gt;data_filename&lt;/span&gt;,&lt;br /&gt;                &lt;span class="keyword"&gt;const&lt;/span&gt; &lt;span class="type"&gt;FileDescriptorProto&lt;/span&gt;&amp;amp; &lt;span class="variable-name"&gt;file_desc_proto&lt;/span&gt;,&lt;br /&gt;                &lt;span class="keyword"&gt;const&lt;/span&gt; &lt;span class="type"&gt;string&lt;/span&gt;&amp;amp; &lt;span class="variable-name"&gt;message_name&lt;/span&gt;) {&lt;br /&gt;&lt;span class="keyword"&gt;const&lt;/span&gt; &lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;kMaxRecieveBufferSize&lt;/span&gt; = 32 * 1024 * 1024;  &lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;32MB&lt;br /&gt;&lt;/span&gt;  &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="type"&gt;char&lt;/span&gt; &lt;span class="variable-name"&gt;buffer&lt;/span&gt;[kMaxRecieveBufferSize];&lt;br /&gt;&lt;br /&gt;&lt;span class="type"&gt;ifstream&lt;/span&gt; &lt;span class="variable-name"&gt;input_stream&lt;/span&gt;(data_filename.c_str());&lt;br /&gt;&lt;span class="keyword"&gt;if&lt;/span&gt; (&lt;span class="negation-char"&gt;!&lt;/span&gt;input_stream.is_open()) {&lt;br /&gt; LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Cannot open data file: "&lt;/span&gt; &amp;lt;&amp;lt; data_filename;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;::&lt;span class="type"&gt;DescriptorPool&lt;/span&gt; &lt;span class="variable-name"&gt;pool&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;const&lt;/span&gt; &lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;::&lt;span class="type"&gt;FileDescriptor&lt;/span&gt;* &lt;span class="variable-name"&gt;file_desc&lt;/span&gt; =&lt;br /&gt; pool.BuildFile(file_desc_proto);&lt;br /&gt;&lt;span class="keyword"&gt;if&lt;/span&gt; (file_desc == &lt;span class="constant"&gt;NULL&lt;/span&gt;) {&lt;br /&gt; LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Cannot get file descriptor from file descriptor"&lt;/span&gt;&lt;br /&gt;            &amp;lt;&amp;lt; file_desc_proto.DebugString();&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;const&lt;/span&gt; &lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;::&lt;span class="type"&gt;Descriptor&lt;/span&gt;* &lt;span class="variable-name"&gt;message_desc&lt;/span&gt; =&lt;br /&gt; file_desc-&amp;gt;FindMessageTypeByName(message_name);&lt;br /&gt;&lt;span class="keyword"&gt;if&lt;/span&gt; (message_desc == &lt;span class="constant"&gt;NULL&lt;/span&gt;) {&lt;br /&gt; LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Cannot get message descriptor of message: "&lt;/span&gt; &amp;lt;&amp;lt; message_name;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;::&lt;span class="type"&gt;DynamicMessageFactory&lt;/span&gt; &lt;span class="variable-name"&gt;factory&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;const&lt;/span&gt; &lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;::&lt;span class="type"&gt;Message&lt;/span&gt;* &lt;span class="variable-name"&gt;prototype_msg&lt;/span&gt; =&lt;br /&gt; factory.GetPrototype(message_desc);&lt;br /&gt;&lt;span class="keyword"&gt;if&lt;/span&gt; (prototype_msg == &lt;span class="constant"&gt;NULL&lt;/span&gt;) {&lt;br /&gt; LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Cannot create prototype message from message descriptor"&lt;/span&gt;;&lt;br /&gt;}&lt;br /&gt;&lt;span class="constant"&gt;google&lt;/span&gt;::&lt;span class="constant"&gt;protobuf&lt;/span&gt;::&lt;span class="type"&gt;Message&lt;/span&gt;* &lt;span class="variable-name"&gt;mutable_msg&lt;/span&gt; = prototype_msg-&amp;gt;New();&lt;br /&gt;&lt;span class="keyword"&gt;if&lt;/span&gt; (mutable_msg == &lt;span class="constant"&gt;NULL&lt;/span&gt;) {&lt;br /&gt; LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Failed in prototype_msg-&amp;gt;New(); to create mutable message"&lt;/span&gt;;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span class="type"&gt;uint32&lt;/span&gt; &lt;span class="variable-name"&gt;proto_msg_size&lt;/span&gt;; &lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;uint32 is the type used in reocrd files.&lt;br /&gt;&lt;/span&gt;  &lt;span class="keyword"&gt;for&lt;/span&gt; (;;) {&lt;br /&gt; input_stream.read((&lt;span class="type"&gt;char&lt;/span&gt;*)&amp;amp;proto_msg_size, &lt;span class="keyword"&gt;sizeof&lt;/span&gt;(proto_msg_size));&lt;br /&gt;&lt;br /&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; (proto_msg_size &amp;gt; kMaxRecieveBufferSize) {&lt;br /&gt;   LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Failed to read a proto message with size = "&lt;/span&gt;&lt;br /&gt;              &amp;lt;&amp;lt; proto_msg_size&lt;br /&gt;              &amp;lt;&amp;lt; &lt;span class="string"&gt;", which is larger than kMaxRecieveBufferSize ("&lt;/span&gt;&lt;br /&gt;              &amp;lt;&amp;lt; kMaxRecieveBufferSize &amp;lt;&amp;lt; &lt;span class="string"&gt;")."&lt;/span&gt;&lt;br /&gt;              &amp;lt;&amp;lt; &lt;span class="string"&gt;"You can modify kMaxRecieveBufferSize defined in "&lt;/span&gt;&lt;br /&gt;              &amp;lt;&amp;lt; __FILE__;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; input_stream.read(buffer, proto_msg_size);&lt;br /&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; (&lt;span class="negation-char"&gt;!&lt;/span&gt;input_stream)&lt;br /&gt;   &lt;span class="keyword"&gt;break&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; (&lt;span class="negation-char"&gt;!&lt;/span&gt;mutable_msg-&amp;gt;ParseFromArray(buffer, proto_msg_size)) {&lt;br /&gt;   LOG(FATAL) &amp;lt;&amp;lt; &lt;span class="string"&gt;"Failed to parse value in KeyValuePair:"&lt;/span&gt; &amp;lt;&amp;lt; pair.value();&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; cout &amp;lt;&amp;lt; mutable_msg-&amp;gt;DebugString();&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;delete&lt;/span&gt; mutable_msg;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;这个函数需要三个输入：1）数据文件的文件名，2）之前GetMessageTypeFromProtoFile函数返回的FileDescriptorProto对象，3）数据文件中每条记录的对应的protocol message 的名字（注意，一个 .proto 文件里可以定义多个 protocol messages，所以我们需要知道数据记录对应的具体是哪一个 message）。&lt;br /&gt;&lt;br /&gt;以上代码中利用了 DescriptorPool 从 FileDescriptorProto 解析出 FileDescriptor（描述 .proto 文件中所有的 messages）。然后用 DynamicMessageFactory 从 FileDescriptor 里找到我们关注的那个 message 的 MessageDescriptor。接下来，我们利用 DynamicMessageFactory 根据 MessageDescriptor 得到一个 prototype message instance。注意，这个 instance 是不能往里面写内容的（immutable）。我们需要调用其 New 成员函数，来生成一个 mutable 的 instance。&lt;br /&gt;&lt;br /&gt;有了一个对应数据记录的 message instance，接下来就好办了。我们读取数据文件中的每条记录。注意：此处我们假设&lt;record_length&gt;&lt;record_content&gt;数据文件中以此存放了一条记录的长度，然后是记录内容，接下来是第二条记录的长度和内容，以此类推。所以在上述函数中，我们循环的读取记录长度，然后解析记录内容。值得注意的是，解析内容利用的是 mutable message instance 的 ParseFromArrary 函数；它需要知道记录的长度。因此我们必须在数据文件中存储每条记录的长度。&lt;br /&gt;&lt;br /&gt;接下来这段程序演示如何调用 GetMessageTypeFromProtoFile 和 PrintDataFile：&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="function-name"&gt;main&lt;/span&gt;(&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;argc&lt;/span&gt;, &lt;span class="type"&gt;char&lt;/span&gt;** &lt;span class="variable-name"&gt;argv&lt;/span&gt;) {&lt;br /&gt;&lt;span class="type"&gt;string&lt;/span&gt; &lt;span class="variable-name"&gt;proto_filename&lt;/span&gt;, &lt;span class="variable-name"&gt;message_name&lt;/span&gt;;&lt;br /&gt;&lt;span class="type"&gt;vector&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;string&lt;/span&gt;&amp;gt; &lt;span class="variable-name"&gt;data_filenames&lt;/span&gt;;&lt;br /&gt;&lt;span class="type"&gt;FileDescriptorProto&lt;/span&gt; &lt;span class="variable-name"&gt;file_desc_proto&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;ParseCmdLine(argc, argv, &amp;amp;proto_filename, &amp;amp;message_name, &amp;amp;data_filenames);&lt;br /&gt;GetMessageTypeFromProtoFile(proto_filename, &amp;amp;file_desc_proto);&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;for&lt;/span&gt; (&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;i&lt;/span&gt; = 0; i &amp;lt; data_filenames.size(); ++i) {&lt;br /&gt; PrintDataFile(data_filenames[i], file_desc_proto, message_name);&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;return&lt;/span&gt; 0;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/record_content&gt;&lt;/record_length&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1192493741782156782?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1192493741782156782/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1192493741782156782' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1192493741782156782'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1192493741782156782'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/06/google-protocol-buffers-proto.html' title='Google Protocol Buffers 实用技术：&lt;br&gt;解析.proto文件和任意数据文件'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4268648438190156077</id><published>2010-06-10T00:49:00.000-07:00</published><updated>2010-06-10T00:53:16.886-07:00</updated><title type='text'>Intel’s Single-chip Cluster Computer</title><content type='html'>I just read a &lt;a href="http://www.linux-mag.com/id/7762/?utm_source=feedburner&amp;amp;utm_medium=feed&amp;amp;utm_campaign=Feed%3A+LinuxMagazine+%28Linux+Magazine%3A+Top+Stories%29&amp;amp;utm_content=Google+Reader"&gt;post on Intel's single-chip cluster computer&lt;/a&gt;. This is really a unique thing: multiple cores in a chip, each core has its own private-and-fast memory: cache. Main memory becomes more like the NFS.  Programmers write programs for this chip using MPI.&lt;br /&gt;&lt;br /&gt;So, this chip is a cluster with super-fast network connection (64GB/s) and very limited memory (because they are implemented as cache).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4268648438190156077?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4268648438190156077/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4268648438190156077' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4268648438190156077'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4268648438190156077'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/06/intels-single-chip-cluster-computer.html' title='Intel’s Single-chip Cluster Computer'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-579583072438084100</id><published>2010-06-05T05:35:00.000-07:00</published><updated>2010-06-05T05:52:15.434-07:00</updated><title type='text'>How to Install Dropbox in Ubuntu Using SOCKS Proxy</title><content type='html'>As a Chinese, I live behind the G.F.W. and have to get access Internet through an SSH tunnel.   However, the Dropbox installation procedure do not support either SOCKS5 or http proxy. (Yes, the document says it works with http_proxy environment variable, but seems it does not.) Thanks to two great tools: proxychains and tsocks, which can launch any application and handle their network communication request through a pre-configured proxy.  With the help of these tools, I can install Dropbox on a brand new Ubuntu machine behind the G.F.W.  Here follows the steps:&lt;div&gt;&lt;ol&gt;&lt;li&gt;Install tsocks (or proxychains) under Ubuntu using synaptic.  Thanks to Ubuntu, who made these tools standard packages.&lt;/li&gt;&lt;li&gt;Get a SSH tunnel (in any way you like).  I paid for an account.  So I can setup a proxy tunnel by the following command line on my computer:&lt;br /&gt;&lt;pre&gt;ssh -D 7070 my-username@my-service-provider-url&lt;/pre&gt;&lt;/li&gt;&lt;li&gt;Configure your Web browser to use the tunnel.  In Firefox, select to use SOCKS5 proxy: localhost:7070.   This enables you access to Dropbox's homepage and download the Ubuntu package.  &lt;/li&gt;&lt;li&gt;Install the package by clicking it in Nautilus.  To check the installation, in a shell, type the command &lt;pre&gt;dropbox start -i&lt;/pre&gt;  If you can see some error messages complaining network access restriction, you made it. &lt;/li&gt;&lt;li&gt;Add the following lines to &lt;tt&gt;/etc/tsocks.conf&lt;/tt&gt;:&lt;pre&gt;server = 127.0.0.1&lt;br /&gt;server_type = 5&lt;br /&gt;server_port = 7070&lt;/pre&gt;If you are using proxychains, you need to modify &lt;tt&gt;/etc/proxychains.conf&lt;/tt&gt; or make your own &lt;tt&gt;~/.proxychains.conf&lt;/tt&gt;.&lt;/li&gt;&lt;li&gt;This time, using tsocks to launch the Dropbox online install procedure:&lt;pre&gt;tsocks dropbox start -i&lt;/pre&gt;Cross your fingers and wait for it to download and install Dropbox, until you see Dropbox icon appears on the top-right corner of your screen.&lt;/li&gt;&lt;li&gt;Right-click Dropbox icon, select "Preferences", and set SOCKS5 proxy like you did for Firefox.  Hopefully, Dropbox starts to sync files you need now.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;Good luck!&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-579583072438084100?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/579583072438084100/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=579583072438084100' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/579583072438084100'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/579583072438084100'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/06/how-to-install-dropbox-in-ubuntu-using.html' title='How to Install Dropbox in Ubuntu Using SOCKS Proxy'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8912461352267670919</id><published>2010-06-04T06:07:00.000-07:00</published><updated>2010-06-04T09:35:15.307-07:00</updated><title type='text'>Building Distributed Programs Using GCC</title><content type='html'>In the case of distributed computing, a program is build (into a binary) and distributed on multiple computers for running.  It is often that we do not want to install libraries depended by our program on every working computers.  Instead, we can instruct GCC to link static libraries by setting the environment variable:&lt;br /&gt;&lt;pre&gt;LDFLAGS=-static -static-libgcc&lt;/pre&gt;&lt;br /&gt;I have tried this method under Cygwin and Ubuntu Linux.  However, if I do this under Darwin (Mac OS X 10.6 Snow Leopard), the linker complains that&lt;br /&gt;&lt;pre&gt;ld: library not found for -lcrt0.o&lt;/pre&gt;In &lt;a href="http://developer.apple.com/mac/library/qa/qa2001/qa1118.html"&gt;this Technical Q&amp;amp;A&lt;/a&gt;, Apple explains that they want to make Mac OS X upgrading easier, so they do not provide crt0.o to encourage dynamic linking.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8912461352267670919?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8912461352267670919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8912461352267670919' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8912461352267670919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8912461352267670919'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/06/building-distributed-programs-using-gcc.html' title='Building Distributed Programs Using GCC'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7535535036952998891</id><published>2010-06-03T22:51:00.000-07:00</published><updated>2010-06-03T22:54:18.449-07:00</updated><title type='text'>Uint64 Constants</title><content type='html'>If we write &lt;tt&gt;uint64 a = 0xffffffffffffffff;&lt;/tt&gt;, the compiler often complains:&lt;pre&gt;error: integer constant is too large for ‘long’ type&lt;/pre&gt;  What we need is to add the &lt;tt&gt;LLU&lt;/tt&gt; suffix: &lt;tt&gt;uint64 a = 0xffffffffffffffffLLU;&lt;/tt&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7535535036952998891?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7535535036952998891/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7535535036952998891' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7535535036952998891'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7535535036952998891'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/06/uint64-constants.html' title='Uint64 Constants'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1066982653283437977</id><published>2010-06-02T19:32:00.000-07:00</published><updated>2010-06-02T19:36:29.546-07:00</updated><title type='text'>Launch a Series of Hadoop Pipes Task</title><content type='html'>It is OK to call &lt;tt&gt;runTask&lt;/tt&gt; multiple times from within &lt;tt&gt;main()&lt;/tt&gt; in a Hadoop Pipes program to launch a series of MapReduce tasks.  However, one thing to remember: MapReduce tasks must not have conflicts output directory, because Hadoop runtime does not start a MapReduce task whose output directory exists.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1066982653283437977?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1066982653283437977/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1066982653283437977' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1066982653283437977'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1066982653283437977'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/06/launch-series-of-hadoop-pipes-tasks.html' title='Launch a Series of Hadoop Pipes Task'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8363842939065966976</id><published>2010-06-02T01:06:00.000-07:00</published><updated>2010-06-10T00:54:06.481-07:00</updated><title type='text'>Incremental Parsing Using Boost Program Options Library</title><content type='html'>I have to say I fell in love with boost::program_options since the first time I use it in developing my own C++ MapReduce implementation.  It can parse command line parameters as well as configuration files.  This makes it convenient for programs which supports and expects a bunch of options, where a MapReduce program is a typical example.&lt;br /&gt;&lt;br /&gt;A special use-case of a command line parser is that a function need to parse some options out from the command line parameters, and then the rest parameters are passed to another function, which parse other options.   For example, the MapReduce runtime requires to get options like "num_map_workers", "num_reduce_workers", etc, and the rest of the program (user customized map and reduce functions) need to parse application-specific options like "topic_dirichlet_prior", "num_lda_topics", etc.  boost::program_options supports such kind of multi-round parsing, where the key is boost::program_options::allow_unregistered(). Here attaches a sample program: (For more explanation on this program, please refer to the official document of boost::program_options.)&lt;br /&gt;   &lt;pre&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;iostream&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;string&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;vector&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;boost/program_options/option.hpp&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;boost/program_options/options_description.hpp&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;boost/program_options/variables_map.hpp&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;boost/program_options/parsers.hpp&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;using&lt;/span&gt; &lt;span class="keyword"&gt;namespace&lt;/span&gt; &lt;span class="constant"&gt;std&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;namespace&lt;/span&gt; &lt;span class="constant"&gt;po&lt;/span&gt; = &lt;span class="constant"&gt;boost&lt;/span&gt;::program_options;&lt;br /&gt;&lt;br /&gt;&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;g_num_map_workers&lt;/span&gt;;&lt;br /&gt;&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;g_num_reduce_workers&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;&lt;span class="type"&gt;vector&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;string&lt;/span&gt;&amp;gt; &lt;span class="function-name"&gt;foo&lt;/span&gt;(&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;argc&lt;/span&gt;, &lt;span class="type"&gt;char&lt;/span&gt;** &lt;span class="variable-name"&gt;argv&lt;/span&gt;) {&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;options_description&lt;/span&gt; &lt;span class="variable-name"&gt;desc&lt;/span&gt;(&lt;span class="string"&gt;"Supported options"&lt;/span&gt;);&lt;br /&gt; desc.add_options()&lt;br /&gt;   (&lt;span class="string"&gt;"num_map_workers"&lt;/span&gt;, &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;value&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;int&lt;/span&gt;&amp;gt;(&amp;amp;&lt;span class="variable-name"&gt;g_num_map_workers&lt;/span&gt;), &lt;span class="string"&gt;"# map workers"&lt;/span&gt;)&lt;br /&gt;   (&lt;span class="string"&gt;"num_reduce_workers"&lt;/span&gt;, &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;value&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;int&lt;/span&gt;&amp;gt;(&amp;amp;&lt;span class="variable-name"&gt;g_num_reduce_workers&lt;/span&gt;), &lt;span class="string"&gt;"# reduce workers"&lt;/span&gt;)&lt;br /&gt;   ;&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;variables_map&lt;/span&gt; &lt;span class="variable-name"&gt;vm&lt;/span&gt;;&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;parsed_options&lt;/span&gt; &lt;span class="variable-name"&gt;parsed&lt;/span&gt; =&lt;br /&gt;   &lt;span class="constant"&gt;po&lt;/span&gt;::command_line_parser(argc, argv).options(desc).allow_unregistered().run();&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::store(parsed, vm);&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::notify(vm);&lt;br /&gt;&lt;br /&gt; cout &amp;lt;&amp;lt; &lt;span class="string"&gt;"The following options were parsed by foo:\n"&lt;/span&gt;;&lt;br /&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; (vm.count(&lt;span class="string"&gt;"num_map_workers"&lt;/span&gt;)) {&lt;br /&gt;   cout &amp;lt;&amp;lt; &lt;span class="string"&gt;"num_map_workers = "&lt;/span&gt; &amp;lt;&amp;lt; g_num_map_workers &amp;lt;&amp;lt; &lt;span class="string"&gt;"\n"&lt;/span&gt;;&lt;br /&gt; }&lt;br /&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; (vm.count(&lt;span class="string"&gt;"num_reduce_workers"&lt;/span&gt;)) {&lt;br /&gt;   cout &amp;lt;&amp;lt; &lt;span class="string"&gt;"num_reduce_workers = "&lt;/span&gt; &amp;lt;&amp;lt; g_num_reduce_workers &amp;lt;&amp;lt; &lt;span class="string"&gt;"\n"&lt;/span&gt;;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; &lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::collect_unrecognized(parsed.options, &lt;span class="constant"&gt;po&lt;/span&gt;::include_positional);&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;bar&lt;/span&gt;(&lt;span class="type"&gt;vector&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;string&lt;/span&gt;&amp;gt;&amp;amp; &lt;span class="variable-name"&gt;rest_args&lt;/span&gt;) {&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;options_description&lt;/span&gt; &lt;span class="variable-name"&gt;desc&lt;/span&gt;(&lt;span class="string"&gt;"Supported options"&lt;/span&gt;);&lt;br /&gt; desc.add_options()&lt;br /&gt;   (&lt;span class="string"&gt;"apple"&lt;/span&gt;, &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;value&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;int&lt;/span&gt;&amp;gt;(), &lt;span class="string"&gt;"# apples"&lt;/span&gt;)&lt;br /&gt;   ;&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;variables_map&lt;/span&gt; &lt;span class="variable-name"&gt;vm&lt;/span&gt;;&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::&lt;span class="type"&gt;parsed_options&lt;/span&gt; &lt;span class="variable-name"&gt;parsed&lt;/span&gt; =&lt;br /&gt;   &lt;span class="constant"&gt;po&lt;/span&gt;::command_line_parser(rest_args).options(desc).allow_unregistered().run();&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::store(parsed, vm);&lt;br /&gt; &lt;span class="constant"&gt;po&lt;/span&gt;::notify(vm);&lt;br /&gt;&lt;br /&gt; cout &amp;lt;&amp;lt; &lt;span class="string"&gt;"The following options were parsed by bar:\n"&lt;/span&gt;;&lt;br /&gt; &lt;span class="keyword"&gt;if&lt;/span&gt; (vm.count(&lt;span class="string"&gt;"apple"&lt;/span&gt;)) {&lt;br /&gt;   cout &amp;lt;&amp;lt; &lt;span class="string"&gt;"apple = "&lt;/span&gt; &amp;lt;&amp;lt; vm[&lt;span class="string"&gt;"apple"&lt;/span&gt;].&lt;span class="type"&gt;as&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;int&lt;/span&gt;&amp;gt;() &amp;lt;&amp;lt; &lt;span class="string"&gt;"\n"&lt;/span&gt;;&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="function-name"&gt;main&lt;/span&gt;(&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;argc&lt;/span&gt;, &lt;span class="type"&gt;char&lt;/span&gt;** &lt;span class="variable-name"&gt;argv&lt;/span&gt;) {&lt;br /&gt; &lt;span class="type"&gt;vector&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;string&lt;/span&gt;&amp;gt; &lt;span class="variable-name"&gt;rest_options&lt;/span&gt; = foo(argc, argv);&lt;br /&gt;&lt;br /&gt; cout &amp;lt;&amp;lt; &lt;span class="string"&gt;"The following cmd args cannot not be recognized by foo:\n"&lt;/span&gt;;&lt;br /&gt; &lt;span class="keyword"&gt;for&lt;/span&gt; (&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;i&lt;/span&gt; = 0; i &amp;lt; rest_options.size(); ++i) {&lt;br /&gt;   cout &amp;lt;&amp;lt; rest_options[i] &amp;lt;&amp;lt; &lt;span class="string"&gt;"\n"&lt;/span&gt;;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; bar(rest_options);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Finally I have to tell that early boost version (e.g., 1.33.1 packed in Cygwin) has bugs in program_options, which leads to core dump in case of unknown options.  The solution to download and build your own boost libraries.  I just built 1.43.0 on Cygwin on my Windows computer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8363842939065966976?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8363842939065966976/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8363842939065966976' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8363842939065966976'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8363842939065966976'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/06/boost-program-options-library.html' title='Incremental Parsing Using Boost Program Options Library'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-807169310345741579</id><published>2010-05-23T07:22:00.000-07:00</published><updated>2010-05-23T07:32:03.233-07:00</updated><title type='text'>The Efficiency of AWK Associative Array</title><content type='html'>I did a little experiment comparing C++ STL map&lt;string,&gt; with AWK associative array in counting word frequency of large text files.  The result is astonishing: AWK associative array is about 6 times faster than the C++ code!&lt;br /&gt;&lt;br /&gt;At first, I put my eyes on the C++ code that splits a line into words.  I tried STL istringstream, C strtok_r, and a string splitting trick that I learned in Google.  However, these choices do not affect the efficiency saliently.&lt;br /&gt;&lt;br /&gt;Then I realized the lesson I learned from parallel LDA (a machine learning method, which has been a key part of my research for over three years) --- map&lt;string,&gt; is about 10 times slower than map&lt;int,&gt;.  I found that this issue has been thoroughly &lt;a href="http://www.drdobbs.com/article/printableArticle.jhtml?articleId=184405453&amp;amp;dept_url=/"&gt;explained&lt;/a&gt; by Lev Kochubeevsky, principle engineer at Netscape, in 2003.  Unfortunately, seems no improvement to STL string emerged since then.&lt;br /&gt;&lt;br /&gt;On the other hand, I highly suspect that AWK, an interpreted language,  implements a trie-based data structure for maps with string-keys.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-807169310345741579?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/807169310345741579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=807169310345741579' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/807169310345741579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/807169310345741579'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/05/efficiency-of-awk-associative-array.html' title='The Efficiency of AWK Associative Array'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-746906645756117339</id><published>2010-05-23T07:18:00.000-07:00</published><updated>2010-05-23T07:21:16.569-07:00</updated><title type='text'>SSHFS, Poor Guys' Network File-system</title><content type='html'>SSHFS is good: as long as one has SSH access, he can mount remote directories, even if he does not have administrative accesses required by traditional network filesystems like NFS or Samba.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-746906645756117339?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/746906645756117339/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=746906645756117339' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/746906645756117339'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/746906645756117339'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/05/sshfs-poor-guys-network-file-system.html' title='SSHFS, Poor Guys&apos; Network File-system'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-3736906074302772256</id><published>2010-05-22T20:56:00.000-07:00</published><updated>2010-05-22T21:02:05.618-07:00</updated><title type='text'>Install and Configure MPICH2 on Ubuntu</title><content type='html'>The general procedure is described in&lt;br /&gt;&lt;a href="http://developer.amd.com/documentation/articles/pages/HPCHighPerformanceLinpack.aspx#four"&gt;http://developer.amd.com/documentation/articles/pages/HPCHighPerformanceLinpack.aspx#four&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I have encountered two cases where MPD on multiple nodes cannot communicate with each other:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The firewall prevents such communication, and&lt;br /&gt;&lt;/li&gt;&lt;li&gt;There are ambiguity in /etc/hosts.&lt;/li&gt;&lt;/ol&gt;For 1., we can disable iptables (the firewall commonly used in various Linux versions) and ufw (Ubuntu firewall) by the following commands:&lt;br /&gt;sudo iptables -P INPUT ACCEPT&lt;br /&gt;sudo iptables -P OUTPUT ACCEPT&lt;br /&gt;sudo ufw disable&lt;br /&gt;&lt;br /&gt;For 2., I just re-edit /etc/hosts to ensure all nodes are referred by their real IP addresses, instead of loop-back style addresses.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-3736906074302772256?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/3736906074302772256/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=3736906074302772256' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3736906074302772256'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3736906074302772256'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/05/install-and-configure-mpich2-on-ubuntu.html' title='Install and Configure MPICH2 on Ubuntu'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4394082163653170447</id><published>2010-05-18T00:22:00.000-07:00</published><updated>2010-05-18T01:02:09.108-07:00</updated><title type='text'>Hadoop Pipes Is Incompatible with Protocol Buffers</title><content type='html'>I just found another reason that I do not like Hadoop Pipes --- I cannot use a serialization of Google protocol buffer as map output key or value.&lt;br /&gt;&lt;br /&gt;For those who are scratching your heads for weird bugs from your Hadoop Pipes programs using Google protocol buffers, please have a look at the following sample program:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;string&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;hadoop/Pipes.hh&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;hadoop/TemplateFactory.hh&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="preprocessor"&gt;#include&lt;/span&gt; &lt;span class="string"&gt;&amp;lt;hadoop/StringUtils.hh&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;using&lt;/span&gt; &lt;span class="keyword"&gt;namespace&lt;/span&gt; &lt;span class="constant"&gt;std&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;class&lt;/span&gt; &lt;span class="type"&gt;LearnMapOutputMapper&lt;/span&gt;: &lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="constant"&gt;HadoopPipes&lt;/span&gt;::&lt;span class="type"&gt;Mapper&lt;/span&gt; {&lt;br /&gt;&lt;span class="keyword"&gt;public&lt;/span&gt;:&lt;br /&gt;&lt;span class="function-name"&gt;LearnMapOutputMapper&lt;/span&gt;(&lt;span class="constant"&gt;HadoopPipes&lt;/span&gt;::&lt;span class="type"&gt;TaskContext&lt;/span&gt;&amp;amp; &lt;span class="variable-name"&gt;context&lt;/span&gt;){}&lt;br /&gt;&lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;map&lt;/span&gt;(&lt;span class="constant"&gt;HadoopPipes&lt;/span&gt;::&lt;span class="type"&gt;MapContext&lt;/span&gt;&amp;amp; &lt;span class="variable-name"&gt;context&lt;/span&gt;) {&lt;br /&gt;  context.emit(&lt;span class="string"&gt;""&lt;/span&gt;, &lt;span class="string"&gt;"&lt;/span&gt;&lt;span&gt;&lt;span class="string"&gt;apple\norange&lt;/span&gt;&lt;/span&gt;&lt;span class="string"&gt;\0banana\tpapaya"&lt;/span&gt;);&lt;br /&gt;}&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;class&lt;/span&gt; &lt;span class="type"&gt;LearnMapOutputReducer&lt;/span&gt;: &lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="constant"&gt;HadoopPipes&lt;/span&gt;::&lt;span class="type"&gt;Reducer&lt;/span&gt; {&lt;br /&gt;&lt;span class="keyword"&gt;public&lt;/span&gt;:&lt;br /&gt;&lt;span class="function-name"&gt;LearnMapOutputReducer&lt;/span&gt;(&lt;span class="constant"&gt;HadoopPipes&lt;/span&gt;::&lt;span class="type"&gt;TaskContext&lt;/span&gt;&amp;amp; &lt;span class="variable-name"&gt;context&lt;/span&gt;){}&lt;br /&gt;&lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;reduce&lt;/span&gt;(&lt;span class="constant"&gt;HadoopPipes&lt;/span&gt;::&lt;span class="type"&gt;ReduceContext&lt;/span&gt;&amp;amp; &lt;span class="variable-name"&gt;context&lt;/span&gt;) {&lt;br /&gt;&lt;span class="keyword"&gt;while&lt;/span&gt; (context.nextValue()) {&lt;br /&gt;  &lt;span class="type"&gt;string&lt;/span&gt; &lt;span class="variable-name"&gt;value&lt;/span&gt; = context.getInputValue(); &lt;span class="comment-delimiter"&gt;// &lt;/span&gt;&lt;span class="comment"&gt;Copy content&lt;br /&gt;&lt;/span&gt;      context.emit(context.getInputKey(), &lt;span class="constant"&gt;HadoopUtils&lt;/span&gt;::toString(value.size()));&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="function-name"&gt;main&lt;/span&gt;(&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;argc&lt;/span&gt;, &lt;span class="type"&gt;char&lt;/span&gt; *&lt;span class="variable-name"&gt;argv&lt;/span&gt;[]) {&lt;br /&gt;&lt;span class="keyword"&gt;return&lt;/span&gt; &lt;span class="constant"&gt;HadoopPipes&lt;/span&gt;::runTask(&lt;span class="constant"&gt;HadoopPipes&lt;/span&gt;::&lt;span class="type"&gt;TemplateFactory&lt;/span&gt;&amp;lt;&lt;span class="type"&gt;LearnMapOutputMapper&lt;/span&gt;,&lt;br /&gt;                          &lt;span class="type"&gt;LearnMapOutputReducer&lt;/span&gt;&amp;gt;());&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The reducer outputs the size of the map output values, which contains special characters: new-line, null-term and tab.  If Hadoop Pipes allows such special characters, then we should see reduce outputs 26, the length of string&lt;span class="string"&gt;&lt;span style="font-family:monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div style="text-align: center;"&gt;&lt;span class="string"&gt;"&lt;/span&gt;&lt;span&gt;&lt;span class="string"&gt;apple\norange&lt;/span&gt;&lt;/span&gt;&lt;span class="string"&gt;\0banana\tpapaya"&lt;/span&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;However, unfortunately, we see 12 in the output, which is the length of string&lt;br /&gt;&lt;div style="text-align: center;"&gt;"&lt;span&gt;&lt;span class="string"&gt;apple\norange"&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div style="text-align: left;"&gt;This shows that map outputs in Hadoop Pipes cannot contain the null-term character, which, however, may appear in a serialization of protocol buffer, as explained in the protocol buffers encoding scheme at:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://code.google.com/apis/protocolbuffers/docs/encoding.html"&gt;http://code.google.com/apis/protocolbuffers/docs/encoding.html&lt;/a&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;I hate Hadoop Pipes, a totally incomplete but released MapReduce API.&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4394082163653170447?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4394082163653170447/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4394082163653170447' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4394082163653170447'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4394082163653170447'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/05/hadoop-pipes-is-incompatible-with.html' title='Hadoop Pipes Is Incompatible with Protocol Buffers'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4974433986647306555</id><published>2010-05-16T18:53:00.000-07:00</published><updated>2010-05-16T19:12:58.500-07:00</updated><title type='text'>MPI-based MapReduce Implementation</title><content type='html'>离开Google之后就没有那么好用的MapReduce实现了。在拼命寻找替代品的时候，发现已经有人在用 MPI 实现 MapReduce。一个 open source 实现是：&lt;br /&gt; http://www.sandia.gov/~sjplimp/mapreduce/doc/Manual.html&lt;br /&gt;&lt;br /&gt;下载之后用了一下，发现对 MapReduce API 的封装很不到位。因此编程方法根本不是在Google 使用 MapReduce 时候的那一套。相比较而言，Hadoop Pipes 对 MapReduce API  的封装更便于使用。&lt;br /&gt;&lt;br /&gt;这就牵扯到一个问题：MapReduce 到底强在哪儿？为什么用过 Google MapReduce 的人都会喜欢它？&lt;br /&gt;&lt;br /&gt;MapReduce 的很多优势在论文里和论坛里都有人强调过了 —— 它可以处理海量的数据，可以支持 auto fault-recovery。但是我觉得最重要的一点是 MapReduce 的 API 很简单 —— 它容许程序员通过定义一个 map 函数和一个 reduce 函数就搞定一个并行程序。所有的 distributed IO、communications、task synchronization、load balancing、fault recovery都不用用户操心。&lt;br /&gt;&lt;br /&gt;很多人在骂 Hadoop 慢。作为一个 Java implementation，Hadoop 确实是我见过的诸多 MapReduce 实现中最慢的（实际使用起来往往比 Last.fm 的 Bash MapReduce 还要慢），但是用Hadoop的人很多。难不成原因之一就是 API 好用？&lt;br /&gt;&lt;br /&gt;我的感觉是：如果一个 MapReduce implementation 失去了 MapReduce 给程序员带来的便利，它的其他各种优势恐怕都要大打折扣了。(离开Google一个多月，我已经记不得我写过哪些不是用 MapReduce 的程序了。)&lt;br /&gt;&lt;br /&gt;BTW：说到这里，顺便说一下， Sphere/Sector (http://sector.sourceforge.net/doc.html) 的 API 也不是 MapReduce API 。从 Sphere Sector 的 tutorial slides 里贴一下一个demo program：&lt;br /&gt;&lt;pre&gt;&lt;dcclient.h&gt;Sector::init(); Sector::login(…)&lt;br /&gt;SphereStream input;&lt;br /&gt;SphereStream output;&lt;br /&gt;SphereProcess myProc;&lt;br /&gt;myProc.loadOperator(“func.so”);&lt;br /&gt;myProc.run(input, output, func, 0);&lt;br /&gt;myProc.read(result)&lt;br /&gt;myProc.close();&lt;br /&gt;Sector::logout(); Sector::close();&lt;/dcclient.h&gt;&lt;/pre&gt;可以看到各种initialization、finalization、operator-loading 之类的操作都是需要用户来写的。其实把这些封装成 MapReduce API 并没有技术难度。而封装一下可以给用户省去很多麻烦。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4974433986647306555?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4974433986647306555/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4974433986647306555' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4974433986647306555'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4974433986647306555'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/05/mpi-based-mapreduce-implementation.html' title='MPI-based MapReduce Implementation'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-109885054860234679</id><published>2010-04-19T08:26:00.000-07:00</published><updated>2010-05-17T20:55:48.338-07:00</updated><title type='text'>Running Hadoop on Mac OS X (Single Node)</title><content type='html'>I installed Hadoop, built its C++ components, and built and ran Pipes programs on my iMac running Snow Leopard.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Installation and Configuration&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Basically, I followed Michael G. Noll's guide, &lt;a href="http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29"&gt;Running Hadoop On Ubuntu Linux (Single-Node Cluster)&lt;/a&gt;, with two things different from the guide.&lt;br /&gt;&lt;br /&gt;In Mac OS X, we need to choose to use Sun's JVM.  This can be done using System Preference.  Then In both .bash_profile and $HADOOP_HOME/conf/hadoop-env.sh, set the JAVA_HOME environment variable:&lt;br /&gt;export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home&lt;br /&gt;&lt;br /&gt;I did not create special account for running Hadoop. (I should, for security reasons, but I am lazy and my iMac is only for personal development, but not real computing...) So, I need to &lt;tt&gt;chmod a+rwx /tmp/hadoop-yiwang&lt;/tt&gt;, where &lt;tt&gt;yiwang&lt;/tt&gt; is my account name, as well what &lt;tt&gt;${user.name}&lt;/tt&gt; refers to in &lt;tt&gt;core-site.xml&lt;/tt&gt;.&lt;br /&gt;&lt;br /&gt;After finishing installation and configuration, we should be able to start all Hadoop services, build and run Hadoop Java programs, and monitor their activities.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Building C++ Components&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Because I do nothing about Java, I write Hadoop programs using Pipes.  The following steps build Pipes C++ library in Mac OS X:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Install XCode and open a terminal window&lt;/li&gt;&lt;li&gt;cd $HADOOP_HOME/src/c++/utils&lt;/li&gt;&lt;li&gt;./configure&lt;/li&gt;&lt;li&gt;make install&lt;/li&gt;&lt;li&gt;cd $HADOOP_HOME/src/c++/pipes&lt;/li&gt;&lt;li&gt;./configure&lt;/li&gt;&lt;li&gt;make install&lt;/li&gt;&lt;/ol&gt;Note that you must build utils before pipes.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Build and Run Pipes Programs&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The following command shows how to link to Pipes libraries:&lt;br /&gt;&lt;pre&gt;g++ -o wordcount wordcount.cc \&lt;br /&gt;-I${HADOOP_HOME}/src/c++/install/include \&lt;br /&gt;-L${HADOOP_HOME}/src/c++/install/lib \&lt;br /&gt;-lhadooputils -lhadooppipes -lpthread&lt;br /&gt;&lt;/pre&gt;To run the program, we need a configuration file, as shown by Apache Hadoop Wiki page.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Build libHDFS&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are some bugs in libHDFS of Apache Hadoop 0.20.2, but it is easy to fix them:&lt;br /&gt;&lt;pre&gt;cd hadoop-0.20.2/src/c++/libhdfs&lt;br /&gt;./configure&lt;br /&gt;Remove #include "error.h" from hdfsJniHelper.c&lt;br /&gt;Remove -Dsize_t=unsigned int from Makefile&lt;br /&gt;make&lt;br /&gt;cp hdfs.h ../install/include/hadoop&lt;br /&gt;cp libhdfs.so ../install/lib&lt;br /&gt;&lt;/pre&gt;Since Mac OS X uses DYLD to mange shared libraries, you need to specify the directory holding libhdfs.so using environment variable DYLD_LIBRARY_PATH. (LD_LIBRARY_PATH does not work.):&lt;br /&gt;&lt;pre&gt;export DYLD_LIBRARY_PATH=$HADOOP_HOME/src/c++/install/lib:$DYLD_LIBRARY_PATH&lt;br /&gt;&lt;/pre&gt;You might want to add above line into your shell configure file (e.g., &lt;tt&gt;~/.bash_profile&lt;/tt&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-109885054860234679?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/109885054860234679/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=109885054860234679' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/109885054860234679'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/109885054860234679'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/04/running-hadoop-on-mac-os-x-single-node.html' title='Running Hadoop on Mac OS X (Single Node)'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5726067600830540726</id><published>2010-04-11T03:09:00.000-07:00</published><updated>2010-04-11T03:28:47.102-07:00</updated><title type='text'>Get Through GFW on Mac OS X Using IPv6</title><content type='html'>In my &lt;a href="http://cxwangyi.blogspot.com/2010/01/get-across-gfw-using-tor-and-bridges.html"&gt;previous post&lt;/a&gt;, I explained how to get through the GFW on Mac OS X using Tor.  Unfortunately, it seems that Tor has been banned by GFW in recently months.  However, some blog posts and mailing list claims that GFW has not been able to filter IPv6 packets.  So I resorted to the IPv6 tunneling protocol, &lt;a href="http://en.wikipedia.org/wiki/Teredo_tunneling"&gt;Teredo&lt;/a&gt;.  A well known software implementation of Teredo on Linux and BSD is &lt;a href="http://www.remlab.net/miredo/"&gt;Miredo&lt;/a&gt;.  Thanks to darco, who recently ported Miredo to Mac OS X, in particular, 10.4, 10.5 and 10.6 with 32-bit kernel.  You can drop by &lt;a href="http://www.deepdarc.com/miredo-osx/"&gt;darco's Miredo for Mac OS X page&lt;/a&gt; or just download the &lt;a href="http://www.deepdarc.com/miredo-osx-prerelease2.pkg.zip"&gt;universal installer&lt;/a&gt; directly.  After download, click to install, and IPv6 tunneling via IPv4 is setup on your Mac.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Before you can use IPv6 to get through the GFW, you need to know IPv6 addresses of the sites you want to visit.  You must add these addresses into your /etc/hosts file, so the Web browser has no need to resolve the addresses via IPv4 (which is under monitoring by GFW).  This &lt;a href="http://docs.google.com/Doc?docid=0ARhAbsvps1PlZGZrZG14bnRfNjFkOWNrOWZmcQ&amp;amp;hl=zh_CN"&gt;Google Doc&lt;/a&gt; contains IPv6 addresses to most Google services (including Youtube).&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5726067600830540726?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5726067600830540726/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5726067600830540726' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5726067600830540726'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5726067600830540726'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/04/get-through-gfw-on-mac-os-x-using-ipv6.html' title='Get Through GFW on Mac OS X Using IPv6'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4004821575972617855</id><published>2010-03-27T18:38:00.000-07:00</published><updated>2010-03-27T18:51:00.499-07:00</updated><title type='text'>Customizing Mac OS X 10.6 For a Linux User</title><content type='html'>I have been a Linux user for years, and changed to Mac OS X 10.6 Snow Leopard in recent months.  Here follows things I've done for Snow Leopard to make it suit for my work habits.&lt;div&gt;&lt;ul&gt;&lt;li&gt;Emacs. &lt;br /&gt;I prefer Aquamacs version 20.1preview5 than the stable version 19.x when I wrote this post.  Aquamacs has many useful Emacs plugins packed already, including AUCTeX for LaTeX editing.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;PdfTk.&lt;br /&gt;Under Unix, Emacs/AUCTeX invokes pdf2dsc (a component in the pdftk package) to do inline preview in PDFLaTeX mode.  Under Mac OS X, thanks to Frédéric Wenzel, who created &lt;a href="http://fredericiana.com/2010/03/01/pdftk-1-41-for-mac-os-x-10-6/"&gt;a DMG of PdfTk&lt;/a&gt; for us.&lt;/li&gt;&lt;li&gt;LaTeX/PDF Preview.&lt;br /&gt;There is a free PDF viewer, Skim, under Mac OS X, which works like the ActiveDVI Viewer under Linux, but displays PDF files instead of DVI.  Whenever you edit your LaTeX source and recompile, Skim will update automatically what it is displaying.&lt;/li&gt;&lt;li&gt;Terminal.&lt;br /&gt;As many others, I use iTerm.  To support bash shortcut keys like Alt-f/b/d, you need to customize iTerm as suggested by many Google search results.  In particular, remember to select "High interception priority" when you do such customization for iTerm under Snow Leopard.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4004821575972617855?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4004821575972617855/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4004821575972617855' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4004821575972617855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4004821575972617855'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/03/customizing-mac-os-x-106-for-linux-user.html' title='Customizing Mac OS X 10.6 For a Linux User'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6672695677028574882</id><published>2010-03-27T00:18:00.000-07:00</published><updated>2010-03-27T01:01:35.784-07:00</updated><title type='text'>Chrome 的安全机制</title><content type='html'>今天看到多篇&lt;a href="http://mygadgetnews.com/2010/03/26/pwn2own-2010-browsers-and-iphone-get-pwned/"&gt;新闻&lt;/a&gt;报道：在 Pwn2Own 2010 黑客大赛上，针对各种浏览器的攻击中，只有 Google Chrome 屹立不倒。随便 Google 一下，会发现很多黑客把 Chrome 的安全性归结于 Chrome 的 sandbox（沙箱）机制。我因此好奇的看了看 Chromium（Google Chrome 的 open source project）的文档,&lt;br /&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.chromium.org/developers/design-documents/sandbox"&gt;http://www.chromium.org/developers/design-documents/sandbox&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dev.chromium.org/developers/design-documents/sandbox/Sandbox-FAQ"&gt;http://dev.chromium.org/developers/design-documents/sandbox/Sandbox-FAQ&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.chromium.org/developers/design-documents/sandbox"&gt;&lt;/a&gt;大概了解了一下 Chrome sandbox 的基本原理。&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Chrome 会启动两类进程：target 和 broker：&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Target 进程执行那些容易被黑客影响并做坏事的工作；主要包括（1）Javascript 程序的解释，和（2）HTML rendering。Target 进程是由 broker 进程创建的。创建之初，broker 就把 target 进程的各种访问权限都剥夺了。也就是说虽然 target 进程可以像所有用户态进程那样通过操作系统调用，请操作系统内核做事，但是操作系统内核会认为 target 进程没有权限，因而拒绝之。【注：在现代操作系统中，用户进程对任何系统资源的访问都得通过“请操作系统内核帮忙”来完成。】所以 target 实际上只能通过进程间调用，请 broker 进程来帮忙做事。&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Broker 进程扮演着操作系统内核的角色 —— 因为 broker 进程执行的代码是浏览器的作者写的，并且不易被坏人注入坏代码，所以我们可以依赖它检查 target 进程请它做的事情是不是靠谱。如果不靠谱，则拒绝之。&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;简单的说，Chrome 的 sandbox 机制复制了操作系统的两层安全概念 —— 用户进程（target 进程）没实权，实权由操作系统内核（broker 进程）把持。实际上是装了第二把锁 —— 当用户和第三方软件对操作系统的错误配置导致操作系统安全机制失效的时候，第二把锁的作用就显示出来了。&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6672695677028574882?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6672695677028574882/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6672695677028574882' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6672695677028574882'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6672695677028574882'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/03/chrome.html' title='Chrome 的安全机制'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5966328086303737772</id><published>2010-03-26T18:50:00.000-07:00</published><updated>2010-03-27T00:18:38.568-07:00</updated><title type='text'>Data-Intensive Text Processing with MapReduce</title><content type='html'>A book draft, Data-Intensive Text Processing with MapReduce, on parallel text algorithms with MapReduce can be found &lt;a href="http://www.umiacs.umd.edu/~jimmylin/book.html"&gt;here&lt;/a&gt;.  This book has chapters covering graph algorithms (breath-first traversal and PageRank) and learning HMM using EM.  The authors work great on presenting concepts using figures, which are comprehensive and intuitive.  &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Indeed, there are many other interesting stuff you can put into a book on MapReducing text processing algorithms.  For example, parallel latent topic models like latent Dirichlet allocation, and tree pruning/learning algorithms for various purposes.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5966328086303737772?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5966328086303737772/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5966328086303737772' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5966328086303737772'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5966328086303737772'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/03/book-draft-data-intensive-text.html' title='Data-Intensive Text Processing with MapReduce'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8594870631307974567</id><published>2010-03-26T08:47:00.000-07:00</published><updated>2010-03-26T18:24:54.396-07:00</updated><title type='text'>Stochastic Gradient Tree Boosting</title><content type='html'>The basic idea of boosting as functional gradient descent and stages/steps as trees, known by gradient boosting, is presented by a Stanford paper:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Jerome Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annuals of Statistics. 2001&lt;/li&gt;&lt;/ul&gt;The same author wrote a note on extending gradient boosting into its stochastic version, stochastic gradient boosting:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Jerome Friedman. Stochastic Gradient Boosting. 1999.&lt;/li&gt;&lt;/ul&gt;The (stochastic) gradient boosting use regression/classification trees as base learners, and needs to learn trees in the procedure of training. If you are interesting with distributed learning of trees using MapReduce, you might want to refer to a recent Google paper:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. VLDB 2009&lt;/li&gt;&lt;/ul&gt;A recent Yahoo paper shows implementing stochastic boosted decision trees using MPI and Hadoop:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Stochastic Gradient Boosted Distributed Decision Trees. CIKM 2009&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8594870631307974567?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8594870631307974567/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8594870631307974567' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8594870631307974567'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8594870631307974567'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/03/stochastic-gradient-tree-boosting.html' title='Stochastic Gradient Tree Boosting'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1438137509708605947</id><published>2010-03-24T19:38:00.000-07:00</published><updated>2010-03-24T19:43:29.189-07:00</updated><title type='text'>Fwd: Ten Commands Every Linux Developer Should Know</title><content type='html'>I like &lt;a href="http://www.linuxjournal.com/article/7330"&gt;this article&lt;/a&gt; in Linux Journal, which reveals some very useful Linux commands that I have never used in my years experience with Unix's.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1438137509708605947?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1438137509708605947/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1438137509708605947' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1438137509708605947'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1438137509708605947'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/03/fwd-ten-commands-every-linux-developer.html' title='Fwd: Ten Commands Every Linux Developer Should Know'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8559216752979644530</id><published>2010-03-15T20:40:00.000-07:00</published><updated>2010-03-15T21:01:36.670-07:00</updated><title type='text'>Some Interesting Data Sets</title><content type='html'>&lt;ul&gt;&lt;li&gt;The ArXiv data set, including full text, abstract/title and citations:&lt;br /&gt;&lt;a href="http://www.cs.cornell.edu/projects/kddcup/datasets.html"&gt;http://www.cs.cornell.edu/projects/kddcup/datasets.html&lt;/a&gt;&lt;/li&gt;&lt;li&gt;The CiteSeer dataset, including Dublin core standard fields and citations:&lt;br /&gt;&lt;a href="http://citeseer.ist.psu.edu/oai.html"&gt;http://citeseer.ist.psu.edu/oai.html&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8559216752979644530?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8559216752979644530/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8559216752979644530' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8559216752979644530'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8559216752979644530'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/03/some-interesting-data-sets.html' title='Some Interesting Data Sets'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-128210449578432098</id><published>2010-02-27T07:17:00.001-08:00</published><updated>2010-02-27T07:37:55.309-08:00</updated><title type='text'>Could Latent Dirichlet Allocation Hanlde Documents with Various Length?</title><content type='html'>I heart some of my colleagues who are working on another latent topic model, which is different from LDA, complains that LDA like documents with similar lengths.  I agree with this. But I feel that can be fixed easily.  Here follows what I think.&lt;br /&gt;&lt;br /&gt;The Gibbs sampling algorithm of LDA samples latent topic assignments from as follows&lt;br /&gt;&lt;a href="http://www.codecogs.com/eqnedit.php?latex=z_i \sim P(w|z)P(z|d) \propto \frac{N(w,z) @plus; \beta}{N(z) @plus; V\beta} \frac{N(z,d) @plus; \alpha}{L_d @plus; \alpha}" target="_blank"&gt;&lt;img src="http://latex.codecogs.com/gif.latex?z_i \sim P(w|z)P(z|d) \propto \frac{N(w,z) + \beta}{N(z) + V\beta} \frac{N(z,d) + \alpha}{L_d + \alpha}" title="z_i \sim P(w|z)P(z|d) \propto \frac{N(w,z) + \beta}{N(z) + V\beta} \frac{N(z,d) + \alpha}{L_d + \alpha}" /&gt;&lt;/a&gt;&lt;br /&gt;where V is the vocabulary size and L&lt;sub&gt;d&lt;/sub&gt; is the length of document d.&lt;br /&gt;&lt;br /&gt;The second term is dependent with the document length.  Just consider an example document is about two topics, A, and B, and half of its words are assigned topic A, the other half are assigned topic B.  So the P(z|d) distribution should have two high bins (height proportional to L/2 + alpha), and all elsewhere are short bins (height proportional to alpha).  So, you see, if the document has 1000 words, alpha has trivial effect to the shape of P(z|d); but if the document contains only 2 words, alpha would have more effects on building the shape of P(z|d).&lt;br /&gt;&lt;br /&gt;An intuitive solution to above problem is to use small alpha for short document (and vice versa).  But would this break the math assumptions under LDA?  No. Because this is equivalent to use different symmetric Dirichlet prior on documents with different lengths.  This does not break the Dirichlet-multinomial conjugacy required by LDA's Gibbs sampling algorithm, but just express a little more prior knowledge than using a symmetric prior for all documents.  Let us set &lt;br /&gt;&lt;a href="http://www.codecogs.com/eqnedit.php?latex=\alpha_d = k L_d" target="_blank"&gt;&lt;img src="http://latex.codecogs.com/gif.latex?\alpha_d = k L_d" title="\alpha_d = k L_d" /&gt;&lt;/a&gt;&lt;br /&gt;for each document.  And users need to specify parameter k as they need to specify alpha before.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-128210449578432098?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/128210449578432098/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=128210449578432098' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/128210449578432098'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/128210449578432098'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/could-latent-dirichlet-hanlde-documents.html' title='Could Latent Dirichlet Allocation Hanlde Documents with Various Length?'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5123702276978050128</id><published>2010-02-21T23:33:00.000-08:00</published><updated>2010-02-21T23:47:52.954-08:00</updated><title type='text'>S-shaped Functions</title><content type='html'>The logit (logistic sigmoid) function:&lt;br /&gt;&lt;a href="http://www.codecogs.com/eqnedit.php?latex=%5Csigma%28x%29%20=%20%5Cfrac%7B1%7D%7B1%20@plus;%20%5Cexp%28-x%29%7D" target="_blank"&gt;&lt;img src="http://latex.codecogs.com/gif.latex?%5Csigma%28x%29%20=%20%5Cfrac%7B1%7D%7B1%20+%20%5Cexp%28-x%29%7D" title="\sigma(x) = \frac{1}{1 + \exp(-x)}" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The tanh function:&lt;br /&gt;&lt;a href="http://www.codecogs.com/eqnedit.php?latex=%5Ctanh%28x%29%20=%202%5Csigma%28x%29%20-%201" target="_blank"&gt;&lt;img src="http://latex.codecogs.com/gif.latex?%5Ctanh%28x%29%20=%202%5Csigma%28x%29%20-%201" title="\tanh(x) = 2\sigma(x) - 1" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The probit function:&lt;br /&gt;&lt;a href="http://www.codecogs.com/eqnedit.php?latex=%5CPhi%28x%29%20=%20%5Cint_%7B-%5Cinfty%7D%5Ex%20%5Cmathcal%7BN%7D%28%5Ctheta;0,1%29%20%5C;%5Cmathrm%7Bd%7D%5Ctheta" target="_blank"&gt;&lt;img src="http://latex.codecogs.com/gif.latex?%5CPhi%28x%29%20=%20%5Cint_%7B-%5Cinfty%7D%5Ex%20%5Cmathcal%7BN%7D%28%5Ctheta;0,1%29%20%5C;%5Cmathrm%7Bd%7D%5Ctheta" title="\Phi(x) = \int_{-\infty}^x \mathcal{N}(\theta;0,1) \;\mathrm{d}\theta" /&gt;&lt;/a&gt;&lt;br /&gt;where&lt;br /&gt;&lt;a href="http://www.codecogs.com/eqnedit.php?latex=%5CPhi%5Cleft%28%5Csqrt%7B%5Cfrac%7B%5Cpi%7D%7B8%7D%7Dx%5Cright%29%20%5Capprox%20%5Csigma%28x%29" target="_blank"&gt;&lt;img src="http://latex.codecogs.com/gif.latex?%5CPhi%5Cleft%28%5Csqrt%7B%5Cfrac%7B%5Cpi%7D%7B8%7D%7Dx%5Cright%29%20%5Capprox%20%5Csigma%28x%29" title="\Phi\left(\sqrt{\frac{\pi}{8}}x\right) \approx \sigma(x)" /&gt;&lt;/a&gt;&lt;br /&gt;For more on this approximation, look at Figure 4.9 of Pattern Recognition and Machine Learning.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5123702276978050128?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5123702276978050128/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5123702276978050128' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5123702276978050128'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5123702276978050128'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/s-shaped-functions.html' title='S-shaped Functions'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8893088419486339453</id><published>2010-02-20T01:23:00.000-08:00</published><updated>2010-02-20T01:54:34.148-08:00</updated><title type='text'>Cavendish Experiment and Modern Machine Learning</title><content type='html'>This is just a joke, so do not follow it seriously...&lt;br /&gt;&lt;br /&gt;The very usual case in modern machine learning is as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;design a model to describe the data, for example, suppose some kind of 2D points are generated along a quadratic curve y = a*x^2 + b*x + c, and&lt;/li&gt;&lt;li&gt;design an algorithm that estimates the model parameters, in our case, a, b, and c, given a set of data (observations), x_1,y_1,x_2,y_2,...x_n,y_n.&lt;/li&gt;&lt;li&gt;The model parameters can be used in some way, say, given a new x and predict its corresponding y.&lt;/li&gt;&lt;/ol&gt;So, when I was reading Prof. Feynman's lecture notes, which mentions &lt;a href="http://en.wikipedia.org/wiki/Cavendish_experiment"&gt;Cavendish experiment&lt;/a&gt;, I thought this experiment is some kind of "learning using machines" --- using the specially designed equipment (machine), Cavendish measured the gravitational constant G in Newton's law of universal gravitation:&lt;br /&gt;&lt;pre&gt;F = G * m1 * m2 / r^2&lt;/pre&gt;And, using the estimated model parameter G, we can do somethings interesting. For example, measure the weight of the earth (by measuring the weight/gravity F of a known small ball m1, and put them back into the equation to get m2, the mass of earth).&lt;br /&gt;&lt;br /&gt;However, this is a joke as I said so you cannot use it in your lecture notes on machine learning.  The fact was that Cavendish did not measure G as stated in many textbooks.  Instead, he measures the earth directly by comparing (1) the force that a big ball with known mass attracts a small ball with (2) the force that the earth attracts the small ball.  If the ratio (2)/(1) is N, then the earth is N times weight of the big ball.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8893088419486339453?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8893088419486339453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8893088419486339453' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8893088419486339453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8893088419486339453'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/cavendish-experiment-and-modern-machine.html' title='Cavendish Experiment and Modern Machine Learning'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8077882534999583307</id><published>2010-02-14T18:29:00.000-08:00</published><updated>2010-02-14T18:32:21.126-08:00</updated><title type='text'>Highlights in LaTeX</title><content type='html'>To make part of the text highlighted in LaTeX, use the following two packages&lt;br /&gt;&lt;pre&gt;\usepackage{color}&lt;br /&gt;\usepackage{soul}&lt;br /&gt;&lt;/pre&gt;And in the text, use macro &lt;tt&gt;\hl&lt;/tt&gt;:&lt;br /&gt;&lt;pre&gt;The authors use \hl{$x=100$ in their demonstration}.&lt;br /&gt;&lt;/pre&gt;Note that if you use only &lt;tt&gt;soul&lt;/tt&gt; without &lt;tt&gt;color&lt;/tt&gt;, &lt;tt&gt;\hl&lt;/tt&gt; just fails to underlines.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8077882534999583307?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8077882534999583307/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8077882534999583307' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8077882534999583307'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8077882534999583307'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/highlights-in-latex.html' title='Highlights in LaTeX'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4216012513805217946</id><published>2010-02-08T02:27:00.000-08:00</published><updated>2010-02-08T02:28:42.842-08:00</updated><title type='text'>A Tutorial on Network Traffic Analysis</title><content type='html'>&lt;div&gt; &lt;a href="http://www.chrissanders.org/?p=47"&gt;http://www.chrissanders.org/?p=47&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4216012513805217946?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4216012513805217946/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4216012513805217946' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4216012513805217946'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4216012513805217946'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/tutorial-on-network-traffic-analysis.html' title='A Tutorial on Network Traffic Analysis'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7309120058696311769</id><published>2010-02-04T19:35:00.000-08:00</published><updated>2010-02-04T19:39:02.172-08:00</updated><title type='text'>Google Puts New Focus on Outside Research</title><content type='html'>It is recently reported that &lt;a href="http://bits.blogs.nytimes.com/2010/02/01/google-extends-outside-research-funding-to-new-fields/"&gt;Google is stepping up its funding&lt;/a&gt; to support the research following four areas:&lt;ul&gt;&lt;li&gt;machine learning&lt;/li&gt;&lt;li&gt;the use of cellphones as data collection devices in science&lt;/li&gt;&lt;li&gt;energy efficiency&lt;/li&gt;&lt;li&gt;privacy&lt;/li&gt;&lt;/ul&gt;Among these four areas, machine learning is on the top. "Three years ago, three of the four research areas would not have been on the company’s priority list", Mr. Spector said. "The only one that was a priority then and now is machine learning, a &lt;em&gt;vital ingredient in search technology&lt;/em&gt;."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7309120058696311769?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7309120058696311769/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7309120058696311769' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7309120058696311769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7309120058696311769'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/google-puts-new-focus-on-outside.html' title='Google Puts New Focus on Outside Research'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5631501397914353989</id><published>2010-02-03T23:31:00.000-08:00</published><updated>2010-02-03T23:35:06.722-08:00</updated><title type='text'>Reduce Data Correlation by Recenter and Rescale</title><content type='html'>In the &lt;a href="http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/bayesdemo.html"&gt;MATLAB statistics demo document&lt;/a&gt;, the training data (a set of car weights) are recentered and rescaled as follows:&lt;br /&gt;&lt;pre&gt;% A set of car weights&lt;br /&gt;weight = [2100 2300 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300]';&lt;br /&gt;weight = (weight-2800)/1000;     % recenter and rescale&lt;/pre&gt;And the document explains the reason of recenter and rescale as&lt;blockquote&gt;The data include observations of weight, number of cars tested, and number failed. We will work with a transformed version of the weights to reduce the correlation in our estimates of the regression parameters.&lt;/blockquote&gt;Could anyone tell me why the recenter and rescale can reduce the correlation?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5631501397914353989?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5631501397914353989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5631501397914353989' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5631501397914353989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5631501397914353989'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/reduce-data-correlation-by-recenter-and.html' title='Reduce Data Correlation by Recenter and Rescale'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8625223328471240243</id><published>2010-02-02T19:27:00.001-08:00</published><updated>2010-02-02T19:27:52.611-08:00</updated><title type='text'>Using aMule with VeryCD</title><content type='html'>http://hi.baidu.com/linsir/blog/item/c4b54839805a9af73a87cea2.html&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8625223328471240243?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8625223328471240243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8625223328471240243' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8625223328471240243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8625223328471240243'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/using-amule-with-verycd.html' title='Using aMule with VeryCD'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6593114091703419637</id><published>2010-02-02T00:45:00.000-08:00</published><updated>2010-02-02T01:54:03.321-08:00</updated><title type='text'>Making Videos Playable on Android and iPhone</title><content type='html'>&lt;div&gt;You might want to convert you home-made video (no pirated video :-!) into a format that your Android phone can play.  The video formats that Android support are listed in Android developers' site: &lt;/div&gt;&lt;div&gt;    &lt;a href="http://developer.android.com/guide/appendix/media-formats.html"&gt;http://developer.android.com/guide/appendix/media-formats.html&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Among the listed formats, H.264 (as a realization of the MPEG-4 standard) has been well accepted by the industry.  Companies including Apple has switched to it.  In the following, I will show you that using open source software on a Linux box can convert your video into H.264 with AVC video and AAC audio.  I took the &lt;a href="http://www.linuxjournal.com/article/9005"&gt;following post&lt;/a&gt; as a reference, but with updates.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;First of all, you need to install the following software packages:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;mplayer: a multimedia player&lt;/li&gt;&lt;li&gt;mencoder: MPlayers's movie encoder&lt;/li&gt;&lt;li&gt;faac: an AAC audio encoder&lt;/li&gt;&lt;li&gt;gpac: a small and flexible implementaiton of the MPEG-4 system standard&lt;/li&gt;&lt;li&gt;x264: video encoder for the H.264/MPEG-4 AVC standard&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;Then we do the following steps to convert the &lt;tt&gt;video.avi&lt;/tt&gt; into a .mp4 file in H.264 format.&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Extract the audio information from &lt;tt&gt;video.avi&lt;/tt&gt; using MPlayer:&lt;pre&gt;mplayer -ao pcm -vc null -vo null video.avi&lt;/pre&gt;This will generate a &lt;tt&gt;audiodump.wav&lt;/tt&gt; file.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Encode &lt;tt&gt;audiodump.wav&lt;/tt&gt; into AAC format&lt;pre&gt;faac --mpeg-vers 4 audiodump.wav&lt;/pre&gt;This generates a &lt;tt&gt;audiodump.aac&lt;/tt&gt; file.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Use &lt;tt&gt;mencoder&lt;/tt&gt; to convert the video content of &lt;tt&gt;video.avi&lt;/tt&gt; into YUV 4:2:0 format, and use &lt;tt&gt;x264&lt;/tt&gt; to encode the output into AVC format&lt;br /&gt;&lt;pre&gt;mkfifo tmp.fifo.yuv&lt;br /&gt;mencoder -vf scale=800:450,format=i420 \&lt;br /&gt; -nosound -ovc raw -of rawvideo \&lt;br /&gt; -ofps 23.976 -o tmp.fifo.yuv video.mp4 2&gt;&amp;amp;1 &gt; /dev/null &amp;amp;&lt;br /&gt;x264 -o max-video.mp4 --fps 23.976 --bframes 2 \&lt;br /&gt; --progress --crf 26 --subme 6 \&lt;br /&gt; --analyse p8x8,b8x8,i4x4,p4x4 \&lt;br /&gt; --no-psnr tmp.fifo.yuv 800x450&lt;br /&gt;rm tmp.fifo.yuv&lt;/pre&gt;We created a named pipe to buffer between &lt;tt&gt;mencoder&lt;/tt&gt; and &lt;tt&gt;x264&lt;/tt&gt;. These command lines generate both &lt;em&gt;Quicktime-compatible&lt;/em&gt; and &lt;em&gt;H.264-compatible&lt;/em&gt; content. This is because Apple Quicktime can now hold H.264 content. Be aware to specify the same video size to &lt;tt&gt;mencoder&lt;/tt&gt; and &lt;tt&gt;x264&lt;/tt&gt;.  In above example, the size is 800x450.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Merge the AAC audio and AVC video into a &lt;tt&gt;.mp4&lt;/tt&gt; file using gpac&lt;pre&gt;MP4Box -add max-video.mp4 -add audiodump.aac \&lt;br /&gt;  -fps 23.976 max-x264.mp4&lt;/pre&gt;MP4Box is a tool in the gpac package.&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6593114091703419637?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6593114091703419637/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6593114091703419637' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6593114091703419637'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6593114091703419637'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/convert-video-into-android-format.html' title='Making Videos Playable on Android and iPhone'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5479701178308549858</id><published>2010-02-02T00:38:00.001-08:00</published><updated>2010-02-02T01:26:16.365-08:00</updated><title type='text'>如何在Mac OS X上配置一个Web服务器</title><content type='html'>最近在我的imac上配置了一个Web服务器，用于管理我自己的技术笔记。和很多朋友们一样，我的机器通过连接ADSL的无线路由器上网，所以遇到的问题应该比较典型，因此我把配置流程记录下来，和大家共享。&lt;br /&gt;&lt;h2&gt;启动Mac OS X上的Apache&lt;/h2&gt;Mac OS X自带了Apache。要启动它很容易。如下图所示：启动System Preference，在Internet &amp;amp; Wireless类别里选择Sharing。然后勾上Web Sharing。这样Apache就启动了。&lt;div&gt;&lt;div id="y2pp" style="TEXT-ALIGN:center"&gt;&lt;br /&gt;&lt;a href="http://docs.google.com/File?id=dd4z727q_1179qzvspff_b" target="_blank"&gt;&lt;img src="http://docs.google.com/File?id=dd4z727q_1179qzvspff_b" style="WIDTH:648px; HEIGHT:543.176471px" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;现在可以在Web浏览器的地址栏里输入http://localhost/~wangyi访问到自己的主页了。【注意wangyi是我在我的imac上的用户名，你得使用你自己的用户名取代它。】这个页面对应的HTML文件就是/Users/wangyi/Sites/index.html。可以通过编辑它来定制你自己的主页。&lt;br /&gt;&lt;h2&gt;让大家能访问我的主页&lt;/h2&gt;我家是通过电信的ADSL服务上网的。为了让家里的几台电脑都能上网，我在ADSL modem上接了一个无线路由器。这样，家里的电脑启动的时候，是由无线路由器动态分配IP地址。而无线路由器的外部IP地址是电信通过ADSL服务给分配的。为了让Internet上的用户都能通过我的外部IP访问到我的Web服务，我需要在无线路由器上做一些设置【端口转发，port forwarding】，使得无线路由器能把Internet用户的访问请求转发到我的imac电脑的Web服务器程序。&lt;br /&gt;&lt;br /&gt;我使用的是TP-LINK的TL-WR541G+无线路由器。为了访问它的配置界面，只需要在浏览器里输入http://192.168.1.1。【192.168.1.1是我的无线路由器的内部IP地址；我的imac的内部IP地址是192.168.1.100。这些是TP-LINK无线路由器的默认设置。】如下图所示：&lt;div&gt;&lt;div id="u0yo" style="TEXT-ALIGN:center"&gt;&lt;a href="http://docs.google.com/File?id=dd4z727q_118hhsx2kv7_b" target="_blank"&gt;&lt;img src="http://docs.google.com/File?id=dd4z727q_118hhsx2kv7_b" style="WIDTH:648px; HEIGHT:467.64532px" /&gt;&lt;/a&gt;&lt;/div&gt;通过在“转发规则”的“虚拟应用程序”项目中，把“触发端口”和“开放端口”都设置成80（HTTP协议的标准端口号），Internet用户就可以在浏览器里输入我的外部IP来访问我的Web服务了。&lt;br /&gt;&lt;h2&gt;设置域名&lt;/h2&gt;&lt;br /&gt;但是现在的问题是，大家通常不知道我的外部IP，因为这是我的ISP（中国电信）分配给我的无线路由器的。但是没问题，我知道我的外部IP，所以我只要注册一个域名指向我的外部IP，然后向大家公开我的域名就可以了。&lt;br /&gt;&lt;br /&gt;通常来说，注册域名都是要花钱的。国内只有一家叫做3322.org的公司提供免费域名注册服务。为了图个便宜，我就用3322.org了。为此，需要访问www.3322.org：&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;在www.3322.org的首页上，免费注册一个用户。我的用户名是cxwangyi&lt;/li&gt;&lt;br /&gt;&lt;li&gt;到“我的控制台”页面，在“动态域名”一栏下，点击“新建”。然后选择一个域名后缀。3322.org提供了几个选择。我选了7766.org。我的域名是我在3322.org的用户名加上我选择的域名后缀，也就是cxwangyi.7766.org。这个配置页面能自动检测我们的外部IP地址，所以不用我们手工输入。其他选项也都选择默认值就行了。抓图如下：&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;h2&gt;更新外部IP地址&lt;/h2&gt;绝大多数ISP(包括中国电信）使用DHCP协议分配IP地址。这就意味着每隔一段时间，我们的IP地址就变了。所以上一步中和cxwangyi.7766.org绑定的IP地址，过了一段时间之后可能就分配给别人的机器了。为此，我们需要时不时的通知3322.org，报告我们最新的IP地址。一个笨办法是每隔一段时间时间访问上图中的设置界面，手工更新我们的IP地址。一个聪明一些的办法是下载3322.org客户端程序，它在运行期间，会自动向3322.org汇报我们的IP地址。第三个办法是用一些标准的工具程序，访问3322.org预留的一个URL，这样我们的IP地址就自然的随着HTTP协议，发给了3322.org，并且被记录下来。3322.org的页面上建议大家使用lynx；对应的命令行是：&lt;pre&gt;lynx -mime_header -auth=cxwangyi:123456 \&lt;br /&gt;"http://www.3322.org/dyndns/update?system=dyndns&amp;amp;hostname=cxwangyi.7766.org"&lt;/pre&gt;其中cxwangyi是我在3322.org上注册的用户名，123456是对应的口令。cxwangyi.7766.org是上一步里我们在3322.org上注册的域名。这些你都得用你自己的。&lt;br /&gt;&lt;br /&gt;很多系统（包括Mac OS X）没有自带lynx，但是附带了一个更简单的标准程序叫curl。用curl向3322.org汇报IP地址的命令行是：&lt;pre&gt;curl -u cxwangyi:123456 \"http://www.3322.org/dyndns/update?system=dyndns&amp;amp;hostname=cxwangyi.7766.org"&lt;br /&gt;&lt;/pre&gt;利用curl或者lynx，以下这个非常简单的Bash脚本每隔10秒钟，就向3322.org汇报一次当时的IP地址：&lt;pre&gt;while [ true ]; \&lt;br /&gt;sleep 10000; \&lt;br /&gt;curl -u cxwangyi:123456 \&lt;br /&gt;"http://www.3322.org/dyndns/update?system=dyndns&amp;amp;hostname=myhost.3322.org"; \&lt;br /&gt;done&lt;/pre&gt;&lt;h2&gt;用Emacs Muse创建技术内容&lt;/h2&gt;有了Web服务，还得有内容。有无数的工具软件用于帮助制作网页。我用的是Emacs Muse，一个Emacs插件，允许用户用一种简单的wiki语法书写内容（包括插图甚至复杂的数学公式），并且可以把结果输出成HTML（或者PDF等格式）。Emacs Muse的下载和安装可以参考其主页上的说明。安装之后，我在我的.emacs文件里加入了如下设置：&lt;blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none"&gt;&lt;div&gt;&lt;div&gt;(add-to-list 'load-path "~/.emacs.d/lisp/muse")&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(require 'muse-mode)     ; load authoring mode&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(require 'muse-html)     ; load publishing styles I use&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(require 'muse-latex)&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(require 'muse-texinfo)&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(require 'muse-docbook)&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(require 'muse-latex2png) ; display LaTeX math equations&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(setq muse-latex2png-scale-factor 1.4) ; the scaling of equation images.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(require 'muse-project)  ; publish files in projects&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(muse-derive-style&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; "technotes-html" "html"&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt; :style-sheet "&amp;lt;link rel=\"stylesheet\" type=\"text/css\" media=\"all\" href=\"../css/wangyi.css\" /&amp;gt;")&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;(setq muse-project-alist&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;      '(("technotes" ("~/TechNotes" :default "index")&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;         (:base "technotes-html" :path "~/Sites/TechNotes"))))&lt;/div&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;其中，~/.emacs.d/lisp/muse是我的Muse的安装目录。~/TechNotes是存储我的技术文档的目录。我的每一篇技术文档是这个目录下的一个后缀为.muse的文本文件（比如HowToSetup.muse）。当我用Emacs编辑这个文件时，只要按组合键control-c control-p，Emacs Muse就自动将这个文档输出成HTML格式，存放在~/Sites/TechNotes目录下（HowToSetup.html）。&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5479701178308549858?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5479701178308549858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5479701178308549858' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5479701178308549858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5479701178308549858'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/02/mac-os-xweb.html' title='如何在Mac OS X上配置一个Web服务器'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5975315182005789745</id><published>2010-01-20T00:28:00.000-08:00</published><updated>2010-01-20T00:40:27.827-08:00</updated><title type='text'>Unix Philosophy</title><content type='html'>It had been a difficult problem to compare Windows and Linux.  I had been holding the idea that in the Unix world, people write small and simple programs which work together via standard linkages like pipe and other inter-process communication mechanisms.  However, under Windows, people tend to write a huge program which can do everything.  &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;An example is word processing.  In the Windows world, we have Microsoft Word, which has a huge number of functions: editing, rendering (WYSIWYG), spell checking, printing, and much more. However, in the Unix world, we use the TeX system, consisting of many programs, each does one simple thing -- TeX macro defines basic typesetting functions, LaTeX defines more, Emacs (or any other editor) edits, pdfLaTeX (and other converters) converts sources into PDF or other formats, AUCTeX or Lyx implements WYSIWYG, and etc.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Well, by mentioning above, I think I am not so bad as I see at least the Separation and Composition philosophy of the Unix world.  However, there are many more that I have not been able to summarize.  Anyway, the lucky thing is a master had summarized them for us, so, please refer to the great book &lt;i&gt;&lt;a href="http://www.faqs.org/docs/artu/ch01s06.html"&gt;The Art of Unix Programming&lt;/a&gt;&lt;/i&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5975315182005789745?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5975315182005789745/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5975315182005789745' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5975315182005789745'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5975315182005789745'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/unix-philosophy.html' title='Unix Philosophy'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-9167634611744917532</id><published>2010-01-19T19:18:00.000-08:00</published><updated>2010-01-19T19:30:14.922-08:00</updated><title type='text'>Hierarchical Classification</title><content type='html'>A 2005 paper, &lt;a href="http://www.blogger.com/www.cs.iastate.edu/~honavar/Papers/FeihongSARA05.pdf"&gt;Learning Classifiers Using Hierarchically Structured Class Taxonomies&lt;/a&gt;, discussed classification into a taxonomy. My general understand is that this problem can be solved by training a set of binary classifiers as the multi-label classification problem.  More details delivered by this paper:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Types of Classification:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Traditional: classify instances into mutually exclusive class labels,&lt;/li&gt;&lt;li&gt;Multi-label: an instance may have more than one labels,&lt;/li&gt;&lt;li&gt;Taxonomic: multi-label and labels are from a hierarchical taxonomy.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;Solutions Proposed:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Binarized: train a set of binary classifiers, each for a label in the taxonomy.  In classification time, if an instance does not belongs to class C, then no need to check it with classifiers belonging to descendants of C.&lt;/li&gt;&lt;li&gt;Split-based: need to read more to understand this solution.  &lt;/li&gt;&lt;/ol&gt;&lt;div&gt;From the experiment results, it seems that above two solutions have similar performance.  And both differs from the bottom-up solution that I saw in Google.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-9167634611744917532?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/9167634611744917532/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=9167634611744917532' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/9167634611744917532'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/9167634611744917532'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/hierarchical-classification.html' title='Hierarchical Classification'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1690543953568449588</id><published>2010-01-19T19:05:00.000-08:00</published><updated>2010-01-19T19:30:07.369-08:00</updated><title type='text'>Something New about LDA and HDP</title><content type='html'>The UC Irvin team have updated their work and published a JMLR 2009 paper: &lt;a href="http://jmlr.csail.mit.edu/papers/v10/newman09a.html"&gt;Distributed Algorithms for Topic Models&lt;/a&gt;.  For HDP, they proposed a greedy approach to matching of new topics.  I also like their ways to visualize the training process.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Diane Hu, a PhD student working on latent topic models for musical analysis, recently wrote a tutorial/survey on LDA, &lt;a href="http://www.blogger.com/cseweb.ucsd.edu/~dhu/docs/research_exam09.pd"&gt;Latent Dirichlet Allocation for Text, Images, and Music&lt;/a&gt;, which introduced LDA basics as well its extension models for images and music. &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1690543953568449588?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1690543953568449588/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1690543953568449588' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1690543953568449588'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1690543953568449588'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/large-scale-lda-and-hdp.html' title='Something New about LDA and HDP'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1102795271095733986</id><published>2010-01-19T17:03:00.000-08:00</published><updated>2010-01-22T06:27:49.731-08:00</updated><title type='text'>SSH on Mac OS X</title><content type='html'>This &lt;a href="http://www.stocksy.co.uk/articles/Mac/ssh_on_mac_os_x/"&gt;article&lt;/a&gt; shows how to start an SSH server on Mac OS X, how to set up loginless, and how to tunnelling unsecure protocols over SSH.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.macosxhints.com/article.php?story=20050707140439980"&gt;article&lt;/a&gt; explains how to change the port number of SSH on Mac OS X.  Note that from Mac OS X 10.4, the mechanism for launching &lt;tt&gt;sshd&lt;/tt&gt; changed from using &lt;tt&gt;xinetd&lt;/tt&gt; to &lt;tt&gt;launchd&lt;/tt&gt;, so changing the port number becomes a little harder.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1102795271095733986?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1102795271095733986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1102795271095733986' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1102795271095733986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1102795271095733986'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/ssh-on-mac-os-x.html' title='SSH on Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2991575931719878001</id><published>2010-01-18T07:55:00.000-08:00</published><updated>2010-01-18T08:12:19.902-08:00</updated><title type='text'>Get Across GFW Using Tor and Bridges</title><content type='html'>在这个技术博客上，我很少用中文写文章。但是此文似乎只有中国用户需要看到。&lt;br /&gt;&lt;br /&gt;作为一个中国网络用户，为了访问我的这个技术博客，我不得不翻墙。在我的iMac和MacBook Pro上，Tor是个很不错的工具。可惜最近不好用了。今晚稍微研究了一下，发现可以通过加入网桥来解决这个问题。&lt;br /&gt;&lt;br /&gt;具体的做法是：&lt;br /&gt;&lt;ol&gt;&lt;li&gt;给bridges@torproject.org写信，内容无所谓；稍后，对方会回复一封邮件，其中有几个可用的网桥。&lt;/li&gt;&lt;li&gt;在Vidalia的界面中选择Settings；在Settings对话框里切换到Network页；选中“My ISP blocks connections to the Tor network”；然后通过“Add a Bridge”输入框一条一条加入邮件中的网桥。可以参见以下抓图：&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_shbz2iI4vAY/S1SIHgVhBUI/AAAAAAAAFa4/yVDinoIWhfs/s1600-h/Screen+shot+2010-01-19+at+12.06.08+AM.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 378px;" src="http://2.bp.blogspot.com/_shbz2iI4vAY/S1SIHgVhBUI/AAAAAAAAFa4/yVDinoIWhfs/s400/Screen+shot+2010-01-19+at+12.06.08+AM.png" alt="" id="BLOGGER_PHOTO_ID_5428113113408931138" border="0" /&gt;&lt;/a&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2991575931719878001?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2991575931719878001/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2991575931719878001' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2991575931719878001'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2991575931719878001'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/get-across-gfw-using-tor-and-bridges.html' title='Get Across GFW Using Tor and Bridges'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_shbz2iI4vAY/S1SIHgVhBUI/AAAAAAAAFa4/yVDinoIWhfs/s72-c/Screen+shot+2010-01-19+at+12.06.08+AM.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2343024642065519920</id><published>2010-01-11T06:58:00.001-08:00</published><updated>2010-01-11T06:58:31.828-08:00</updated><title type='text'>Maximum Entropy Modeling</title><content type='html'>http://homepages.inf.ed.ac.uk/lzhang10/maxent.html&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2343024642065519920?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2343024642065519920/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2343024642065519920' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2343024642065519920'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2343024642065519920'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/maximum-entropy-modeling.html' title='Maximum Entropy Modeling'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-466736849937014199</id><published>2010-01-08T07:22:00.001-08:00</published><updated>2010-06-24T22:26:04.657-07:00</updated><title type='text'>Generate Core Dump Files</title><content type='html'>If you want your program generates core dump files (including stack trace) when it encounters a segmentation fault, remember to set the following shell option&lt;pre&gt;ulimit -c unlimited&lt;/pre&gt;Before running your program.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once the core file is generated (say &lt;tt&gt;core&lt;/tt&gt;), we can check the stack trace using GDB:&lt;pre&gt;gdb program_file core&lt;/pre&gt;Then type GDB command&lt;pre&gt;where&lt;/pre&gt;which will show you the stack trace.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Special Notes for Cygwin&lt;/span&gt;&lt;br /&gt;&lt;p&gt;When a process (i.e. foo.exe) cores, a default stackdump  foo.exe.stackdump is generated. This stackdump contains (among other  things) stack frames and functions addresses. You can make some sense of  it by using the `addr2line' utility, but it's not as convenient as  using a debugger.&lt;/p&gt; &lt;p&gt;Which takes me to the actual useful bit on information in this post.  You can instruct Cygwin to start your gdb debugger just in time when an  fault occurs or have Cygwin generate a real core dump.&lt;/p&gt; &lt;p&gt;To achieve this, add `error_start=action' to the Cygwin environment  variable:&lt;/p&gt; &lt;pre&gt;# start gdb&lt;br /&gt;export CYGWIN="$CYGWIN error_start=gdb -nw %1 %2"&lt;br /&gt;&lt;br /&gt;# generate core dump&lt;br /&gt;export CYGWIN="$CYGWIN error_start=dumper -d %1 %2"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-466736849937014199?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/466736849937014199/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=466736849937014199' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/466736849937014199'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/466736849937014199'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/generate-core-dump-files.html' title='Generate Core Dump Files'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1100114848145000512</id><published>2010-01-08T07:09:00.000-08:00</published><updated>2010-01-08T07:11:33.211-08:00</updated><title type='text'>A Step-by-Step Tutorial on Autotools</title><content type='html'>&lt;div&gt;Autotools are so complicated for new users, however, I am lucky this evening and found an excellent &lt;a href="http://www.lrde.epita.fr/~adl/autotools.html"&gt;step-by-step tutorial&lt;/a&gt;.  By following it, I packed my Hadoop Streaming wrapper for C++ in few minutes!  I would like to donate to the author if s/he wants. :-)&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1100114848145000512?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1100114848145000512/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1100114848145000512' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1100114848145000512'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1100114848145000512'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/step-by-step-tutorial-on-autotools.html' title='A Step-by-Step Tutorial on Autotools'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2492804238385755344</id><published>2010-01-07T21:45:00.000-08:00</published><updated>2010-01-08T08:09:48.257-08:00</updated><title type='text'>A C++ MapReduce "Implementation" Basing on Hadoop Streaming</title><content type='html'>Hadoop has two mechanisms to support using languages other than Java:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Hadoop &lt;i&gt;Pipes&lt;/i&gt;, which provides a C++ library pair to support Hadoop programs in C/C++ only, and&lt;/li&gt;&lt;li&gt;Hadoop &lt;i&gt;Streamining&lt;/i&gt;, which languages any executable files in map/reduce worker processes, and thus support any languages.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;&lt;div&gt;However, in Hadoop 0.20.1, the support to Pipes, known as Java code in package org.apache.hadoop.mapred.pipes have been marked &lt;i&gt;deprecated&lt;/i&gt;.  So I guess Hadoop 0.20.1 has not port to fully support Pipes.  Some other posts in forums also discussed this issue.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;So, I would like to turn to use Streaming and C++.  Michael G. Noll wrote an excellent &lt;a href="http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python"&gt;tutorial on Streaming using Python&lt;/a&gt;, which shows that Streaming is equivalent to invoke your map and reduce program using the following shell command:&lt;br /&gt;&lt;pre&gt;cat input_file | map_program | sort | reduce_program&lt;/pre&gt;&lt;/div&gt;Of couse, as you know, Hadoop runs the shell pipes on a computing cluster in parallel.&lt;pre&gt;hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.20.1-streaming.jar \&lt;br /&gt;-file ./word_count_mapper -mapper ./word_count_mapper \&lt;br /&gt;-file ./word_count_reducer -reducer ./word_count_reducer \&lt;br /&gt;-input ./input/*.txt -output&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Basing on Hadoop Streamming, I wrote a C++ MapReduce wrapper (more precisely, it should be called a MapReduce implementation, but the code is simple when built on Hadoop Streaming, that I feel embarrassed to call it an "implementation").   Anyway, I found it is interesting that this simple wrapper support secondary keys, whereas &lt;tt&gt;org.apache.hadoop.mapreduce &lt;/tt&gt;does not yet. :-)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have created a Google Code project to host this simple implementation: &lt;a href="http://code.google.com/p/hadoop-stream-mapreduce/"&gt;Hadoop Streaming MapReduce&lt;/a&gt;, and imported the code using the following command line:&lt;pre&gt;svn import hadoop-streaming-mapreduce/ https://hadoop-stream-mapreduce.googlecode.com/svn/trunk -m 'Initial import'&lt;/pre&gt;.  So you should be able to checkout the code now.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2492804238385755344?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2492804238385755344/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2492804238385755344' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2492804238385755344'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2492804238385755344'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/writing-hadoop-programs-using-c.html' title='A C++ MapReduce &quot;Implementation&quot; Basing on Hadoop Streaming'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7413151825569604048</id><published>2010-01-06T00:58:00.001-08:00</published><updated>2010-01-06T01:22:51.830-08:00</updated><title type='text'>Map-Reduce-Merge for Join Operation</title><content type='html'>In this SIGMOD 2007 paper: &lt;a href="http://portal.acm.org/citation.cfm?doid=1247480.1247602"&gt;Map-reduce-merge: simplified relational data processing on large clusters&lt;/a&gt;, the authors add to Map-Reduce a Merge phase that can efficiently merge data already partitioned and sorted (or hashed) by map and reduce modules, and demonstrate that this new model can express relational algebra operators as well as implement several join algorithms.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, I think the newly added merge stage could be implemented using another MapReduce job -- the mapper scans over key-value pairs of all lineage and output them identically; the shuffling stage will merge values of different lineage but the same key into reduce inputs; finally, the reducer can do whatever supposed to be done by merger.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Other people told me that there are more ways to do merge using MapReduce model.  But above simple solution seems one of the most scalable.  In short, if I am going to implement a parallel programming model given the objective to support joining of relational data, I would just implement MapReduce, rather than MapReduce-Merge.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7413151825569604048?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7413151825569604048/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7413151825569604048' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7413151825569604048'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7413151825569604048'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/map-reduce-merge-for-join-operation.html' title='Map-Reduce-Merge for Join Operation'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6128857805144713496</id><published>2010-01-02T23:13:00.000-08:00</published><updated>2010-01-03T04:21:31.992-08:00</updated><title type='text'>Compare GNU GCJ with Sun's JVM</title><content type='html'>On &lt;a href="http://en.wikipedia.org/wiki/GNU_Compiler_for_Java"&gt;this Wikipedia page&lt;/a&gt;, there is &lt;a href="http://zigabyte.com/blog/GCJ_vs_Java_JIT_Performance_Comparison.doc"&gt;a link to Alex Ramos's experiment&lt;/a&gt;, which compares the performance of native binary generated by GNU's GCJ from Java program and bytecode binary generated by Sun's JDK and runs on JIT JVM.  As Alex did the comparison on AMD CPU, I did more additional ones.  Here are the results.&lt;br /&gt;&lt;style type="text/css"&gt;.nobrtable br { display: none }&lt;/style&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table&gt;&lt;br /&gt;  &lt;tr style="background:lightgreen"&gt;&lt;br /&gt;    &lt;td&gt;System&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;Java version&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;Sum Mflops&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;Sqrt Mflops&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;Exp Mflops&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;  &lt;tr&gt;&lt;br /&gt;    &lt;td&gt;2x AMD 64 5000+, Ubuntu&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;JIT 1.6.0_14&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;99&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;43&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;10&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;  &lt;tr&gt;&lt;br /&gt;    &lt;td&gt;&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;GCJ 4.3.2&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;64&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;65&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;13&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;  &lt;tr&gt;&lt;br /&gt;    &lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;  &lt;tr&gt;&lt;br /&gt;    &lt;td&gt;2x Intel Core2 2.4GHz, Ubuntu&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;JIT 1.6.0_0&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;87.4&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;36.9&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;16.6&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;  &lt;tr&gt;&lt;br /&gt;    &lt;td&gt;&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;GCJ 4.2.4&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;150.6&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;39.3&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;30&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;  &lt;tr&gt;&lt;br /&gt;    &lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;  &lt;tr&gt;&lt;br /&gt;    &lt;td&gt;Intel T2600 2.16GHz, Cygwin&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;JIT 1.6.0_17&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;45.4&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;34.8&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;10.4&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;  &lt;tr&gt;&lt;br /&gt;    &lt;td&gt;&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;GCJ 3.4.4&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;84.1&lt;/td&gt;&lt;br /&gt;    &lt;td&gt;23.7&lt;/td&gt;&lt;br /&gt;    &lt;td style="background:yellow"&gt;12.1&lt;/td&gt;&lt;br /&gt;  &lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/div&gt;The first comparison was done by Alex; I just copy-n-pasted his results.  The second was done on my workstation.  The third on my IBM T60p notebook computer.  I also tried to do the comparison on my MacBook Pro, but MacPorts cannot build and install GCJ correctly.&lt;br /&gt;&lt;br /&gt;&lt;font style="background:yellow"&gt;Generally, GCJ beats JIT on numerical computing.  However, I have to mention that it takes a lot more time to start the binary generated by GCJ. (I do not know why...)&lt;/font&gt;&lt;br /&gt;&lt;br /&gt;Here attaches the Java source code (&lt;tt&gt;VectorMultiplication.java&lt;/tt&gt;), which is almost identical to Alex's, but use much shorter vectors (1M v.s. 20M), so more computer can run it.&lt;pre&gt;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;java&lt;/span&gt;.&lt;span class="constant"&gt;util&lt;/span&gt;.&lt;span class="type"&gt;Random&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;class&lt;/span&gt; &lt;span class="type"&gt;VectorMultiplication&lt;/span&gt; {&lt;br /&gt;&lt;br /&gt;  &lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="function-name"&gt;vector_mul&lt;/span&gt;(&lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;a&lt;/span&gt;[], &lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;b&lt;/span&gt;[], &lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;n&lt;/span&gt;, &lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;c&lt;/span&gt;[]) {&lt;br /&gt;    &lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;s&lt;/span&gt; = 0;&lt;br /&gt;    &lt;span class="keyword"&gt;for&lt;/span&gt; (&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;i&lt;/span&gt; = 0; i &amp;lt; n; ++i)&lt;br /&gt;      s += c[i] = a[i] * b[i];&lt;br /&gt;    &lt;span class="keyword"&gt;return&lt;/span&gt; s;&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;  &lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;vector_sqrt&lt;/span&gt;(&lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;a&lt;/span&gt;[], &lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;b&lt;/span&gt;[], &lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;n&lt;/span&gt;) {&lt;br /&gt;    &lt;span class="keyword"&gt;for&lt;/span&gt; (&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;i&lt;/span&gt; = 0; i &amp;lt; n; ++i)&lt;br /&gt;      b[i] = Math.sqrt(a[i]);&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;  &lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;vector_exp&lt;/span&gt;(&lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;a&lt;/span&gt;[], &lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;b&lt;/span&gt;[], &lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;n&lt;/span&gt;) {&lt;br /&gt;    &lt;span class="keyword"&gt;for&lt;/span&gt; (&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;i&lt;/span&gt; = 0; i &amp;lt; n; ++i) &lt;br /&gt;      b[i] = Math.exp(a[i]);&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;  &lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;main&lt;/span&gt;(&lt;span class="type"&gt;String&lt;/span&gt;[] &lt;span class="variable-name"&gt;args&lt;/span&gt;) {&lt;br /&gt;    &lt;span class="keyword"&gt;final&lt;/span&gt; &lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;MEGA&lt;/span&gt; = 1000 * 1000;&lt;br /&gt;    &lt;span class="type"&gt;Random&lt;/span&gt; &lt;span class="variable-name"&gt;r&lt;/span&gt; = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;Random&lt;/span&gt;(0);&lt;br /&gt;    &lt;span class="type"&gt;double&lt;/span&gt; &lt;span class="variable-name"&gt;a&lt;/span&gt;[], &lt;span class="variable-name"&gt;b&lt;/span&gt;[], &lt;span class="variable-name"&gt;c&lt;/span&gt;[];&lt;br /&gt;    &lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;n&lt;/span&gt; = 1 * MEGA;&lt;br /&gt;    a = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;double&lt;/span&gt;[n];&lt;br /&gt;    b = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;double&lt;/span&gt;[n];&lt;br /&gt;    c = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;double&lt;/span&gt;[n];&lt;br /&gt;&lt;br /&gt;    &lt;span class="keyword"&gt;for&lt;/span&gt; (&lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;i&lt;/span&gt; = 0; i &amp;lt; n; ++i) {&lt;br /&gt;      a[i] = r.nextDouble();&lt;br /&gt;      b[i] = r.nextDouble();&lt;br /&gt;      c[i] = r.nextDouble();&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    &lt;span class="type"&gt;long&lt;/span&gt; &lt;span class="variable-name"&gt;start&lt;/span&gt; = System.currentTimeMillis();&lt;br /&gt;    vector_mul(a, b, n, c);&lt;br /&gt;    System.out.println(&lt;span class="string"&gt;"MULT MFLOPS: "&lt;/span&gt; +&lt;br /&gt;                       n/((System.currentTimeMillis() - start)/1000.0)/MEGA);&lt;br /&gt;&lt;br /&gt;    start = System.currentTimeMillis();&lt;br /&gt;    vector_sqrt(c, a, n);&lt;br /&gt;    System.out.println(&lt;span class="string"&gt;"SQRT MFLOPS: "&lt;/span&gt; +&lt;br /&gt;                       n/((System.currentTimeMillis() - start)/1000.0)/MEGA);&lt;br /&gt;&lt;br /&gt;    start = System.currentTimeMillis();&lt;br /&gt;    vector_exp(c, a, n);&lt;br /&gt;    System.out.println(&lt;span class="string"&gt;"EXP MFLOPS: "&lt;/span&gt; +&lt;br /&gt;                       n/((System.currentTimeMillis() - start)/1000.0)/MEGA);&lt;br /&gt;  }&lt;br /&gt;}&lt;/pre&gt;On my Core2 workstation, the way I invoked GCJ is identical to that used in Alex's experiment:&lt;pre&gt;gcj -O3 -fno-bounds-check -mfpmath=sse -ffast-math -march=native \&lt;br /&gt; --main=VectorMultiplication -o vec-mult VectorMultiplication.java&lt;br /&gt;&lt;/pre&gt;On my notebooks, I use&lt;pre&gt;gcj -O3 -fno-bounds-check -ffast-math \&lt;br /&gt; --main=VectorMultiplication -o vec-mult VectorMultiplication.java&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6128857805144713496?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6128857805144713496/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6128857805144713496' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6128857805144713496'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6128857805144713496'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/compare-gnu-gcj-with-suns-jvm.html' title='Compare GNU GCJ with Sun&apos;s JVM'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-404344556183280150</id><published>2010-01-01T17:11:00.000-08:00</published><updated>2010-01-19T19:30:28.696-08:00</updated><title type='text'>Learning Java as a C++ Programmer</title><content type='html'>&lt;b&gt;Primitive Data Types&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;tt&gt;char&lt;/tt&gt; is 16-bit. &lt;tt&gt;byte&lt;/tt&gt; is 8-bit. &lt;tt&gt;boolean&lt;/tt&gt; corresponds to &lt;tt&gt;bool&lt;/tt&gt; in C++.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;All Java primitive types are signed.&lt;span style="color:red;"&gt;Why?&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Casting&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Java is more strongly typed than C++.  No way to convert between &lt;tt&gt;boolean&lt;/tt&gt; and integer types.&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Operators&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Java has two new operators, &gt;&gt;&gt; and &gt;&gt;&gt;=. Each of these performs a right shift with zero fill.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Java operators cannot be overloaded, in order to prevent unnecessary bugs&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Struct and union&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;No struct or union.&lt;/ul&gt;&lt;b&gt;Arrays&lt;/b&gt;&lt;ul&gt;&lt;li&gt;Arrays are objects, defined by &lt;tt&gt;Type []&lt;/tt&gt;.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Out of index accessing causes &lt;tt&gt;ArrayIndexOutOfBoundsException&lt;/tt&gt; exception.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;Classes&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt; &lt;li&gt;Can set default value of class data members&lt;/li&gt;&lt;br /&gt; &lt;li&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-404344556183280150?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/404344556183280150/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=404344556183280150' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/404344556183280150'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/404344556183280150'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2010/01/learning-java-as-c-programmer.html' title='Learning Java as a C++ Programmer'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-3307180624087937333</id><published>2009-12-29T00:35:00.000-08:00</published><updated>2010-01-18T08:13:15.634-08:00</updated><title type='text'>Comparing Technologies Inside and Outside Google</title><content type='html'>Leading positions in some key technological fields have nominated Google the giant in the Internet industry.  Now, the open source community and other companies are keeping up with Google in these fields.  Here follows a comparison of published Google technologies and those developed outside of Google.  I will update this list as I know more.&lt;br /&gt;&lt;style type="text/css"&gt;.nobrtable br { display: none }&lt;/style&gt;&lt;br /&gt;&lt;div class="nobrtable"&gt;&lt;br /&gt;&lt;table valign="top"&gt;&lt;tbody&gt;&lt;tr style="background:lightgreen" border="1"&gt;  &lt;td&gt;Development Tools&lt;/td&gt;  &lt;td&gt;Inside Google&lt;/td&gt;  &lt;td&gt;Outside Google&lt;/td&gt;  &lt;td&gt;Remark&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;  &lt;td&gt;Code review&lt;/td&gt;  &lt;td&gt;Mondrian&lt;/td&gt;  &lt;td&gt;&lt;a href="http://code.google.com/p/rietveld"&gt;Rietveld&lt;/a&gt;&lt;/td&gt;  &lt;td&gt;Both Mondrian and Rietveld are written by the inventor of Python&lt;/td&gt; &lt;/tr&gt;&lt;tr style="background:lightgreen"&gt;  &lt;td&gt;Infrastructure&lt;/td&gt;  &lt;td&gt;Inside Google&lt;/td&gt;  &lt;td&gt;Outside Google&lt;/td&gt;  &lt;td&gt;Remark&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;  &lt;td&gt;Distributed File System&lt;/td&gt;  &lt;td&gt;GFS&lt;/td&gt;  &lt;td&gt;HDFS&lt;/td&gt;  &lt;td&gt;Hadoop's distributed file system&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;  &lt;td&gt;&lt;/td&gt;  &lt;td&gt;&lt;/td&gt;  &lt;td&gt;&lt;a href="http://kosmosfs.sourceforge.net/"&gt;CloudStore&lt;/a&gt;&lt;/td&gt;  &lt;td&gt;Formally known as Kosmos file system&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;  &lt;td&gt;File Format&lt;/td&gt;  &lt;td&gt;SSTable&lt;/td&gt;  &lt;td&gt;String Table&lt;/td&gt;  &lt;td&gt;Hadoop's file format of entries of key-value pairs&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;  &lt;td&gt;Distributed Storage&lt;/td&gt;  &lt;td&gt;Bigtable&lt;/td&gt;  &lt;td&gt;&lt;a href="http://www.hypertable.org/"&gt;Hypertable&lt;/a&gt;&lt;/td&gt;  &lt;td&gt;Baidu is a main sponsor of Hypertable.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;  &lt;td&gt;&lt;/td&gt;  &lt;td&gt;&lt;/td&gt;  &lt;td&gt;&lt;a href="http://hadoop.apache.org/hbase/"&gt;HBase&lt;/a&gt;&lt;/td&gt;  &lt;td&gt;A Hadoop sub-project as an alternative of Bigtable.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;  &lt;td&gt;Parallel Computing&lt;/td&gt;  &lt;td&gt;MapReduce&lt;/td&gt;  &lt;td&gt;&lt;a href="http://hadoop.apache.org/mapreduce"&gt;Hadoop&lt;/a&gt;&lt;/td&gt;  &lt;td&gt;Hadoop was initiated by guys formally work in Google's MapReduce team.&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;br /&gt;  &lt;td&gt;Remote Procedure Call&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;Protocol Buffer&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;&lt;a href="http://incubator.apache.org/thrift/"&gt;Thrift&lt;/a&gt;&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;Thrift was developed by Facebook and is now an Apache project.&lt;/td&gt;&lt;br /&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;br /&gt;  &lt;td&gt;Data Warehouse&lt;/d&gt;&lt;br /&gt;  &lt;td&gt;Dremel&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;Hive&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;Hive was developed by Facebook and is now an Apache Hadoop project.&lt;/td&gt;&lt;br /&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr style="background:lightgreen"&gt;&lt;br /&gt;  &lt;td&gt;API&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;Inside Google&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;Outside Google&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;Remark&lt;/td&gt;&lt;br /&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;br /&gt;  &lt;td&gt;Networking&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;(I do not know)&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;boost.asio&lt;/td&gt;&lt;br /&gt;  &lt;td&gt;boost.asio provides an C++ abstraction to network programming&lt;/td&gt;&lt;br /&gt;&lt;/tr&gt;&lt;br /&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-3307180624087937333?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/3307180624087937333/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=3307180624087937333' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3307180624087937333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3307180624087937333'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/comparing-technologies-inside-and.html' title='Comparing Technologies Inside and Outside Google'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5971079573184410429</id><published>2009-12-28T22:26:00.000-08:00</published><updated>2009-12-28T23:03:29.206-08:00</updated><title type='text'>Learning Boosting</title><content type='html'>&lt;b&gt;References&lt;/b&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The Boosting Approach to Machine Learning An Overview, RE Schapire, 2001.&lt;br /&gt;Schapire is one of the inventor of AdaBoost.  This article starts with the pseudo code of AdaBoost, which is helpful to understand the basic procedure of boosting algorithms.&lt;/li&gt;&lt;/ol&gt;&lt;b&gt;What is Boosting?&lt;/b&gt;&lt;br /&gt;Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? (From Wikipedia)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Boosting Algorithms&lt;/b&gt;&lt;br /&gt;Most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data is reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight (some boosting algorithms actually decrease the weight of repeatedly misclassified examples, e.g., boost by majority and BrownBoost). Thus, future weak learners focus more on the examples that previous weak learners misclassified.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;AdaBoost&lt;/b&gt;&lt;br /&gt;The pseudo code of AdaBoost is as follows&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_shbz2iI4vAY/Szmn9QH84xI/AAAAAAAAFZg/FqNHWyoDwe8/s1600-h/adaboost.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_shbz2iI4vAY/Szmn9QH84xI/AAAAAAAAFZg/FqNHWyoDwe8/s400/adaboost.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5420548297259279122" /&gt;&lt;/a&gt;As we can see from this algorithm:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The weight distribution over training examples changes in each iteration, and the change ratio is determined by alpha.&lt;/li&gt;&lt;li&gt;The choose of alpha is not arbitrary, insteads, it is based on the error of weak learner.  Reer to [1] for details.&lt;/li&gt;&lt;li&gt;The aggregation of weak learners uses alpha to weight each learner.&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5971079573184410429?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5971079573184410429/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5971079573184410429' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5971079573184410429'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5971079573184410429'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/learning-boosting.html' title='Learning Boosting'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_shbz2iI4vAY/Szmn9QH84xI/AAAAAAAAFZg/FqNHWyoDwe8/s72-c/adaboost.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6740824165955102502</id><published>2009-12-28T19:34:00.000-08:00</published><updated>2009-12-28T19:38:38.162-08:00</updated><title type='text'>Emacs Moves to Bazaar</title><content type='html'>Noticed at Solidot that Emacs has moved their version control to Bazaar, a version control system that can work in traditional centralized way or a brand new distributed way.  After going over Bazaar's &lt;a href="http://doc.bazaar.canonical.com/bzr.2.0/en/user-guide/index.html"&gt;users guide&lt;/a&gt;, I feel that the distributed way is suitable for developers all over the world and working on the same project, because each developer submit code into his local repository (the reason that Bazaar is called distributed) to record his own work, and publish his work via email and SFTP.  It is then becomes the responsibility of the maintainer of the main repository to merge individual's work back into the main branch.  I may need to read more about this, but anyway, it is interesting.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6740824165955102502?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6740824165955102502/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6740824165955102502' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6740824165955102502'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6740824165955102502'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/emacs-moves-to-bazaar.html' title='Emacs Moves to Bazaar'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8777245442326858910</id><published>2009-12-28T18:37:00.000-08:00</published><updated>2009-12-28T19:07:45.076-08:00</updated><title type='text'>A Small Computing Cluster on Board</title><content type='html'>&lt;div&gt;In my previous &lt;a href="http://cxwangyi.blogspot.com/2009/12/collapsed-gibbs-sampling-of-lda-on-gpu.html"&gt;post&lt;/a&gt;, I mentioned a newly developed GPU-based parallel Gibbs sampling algorithm for inference of LDA.  Of course, as you know, there are many other GPU-based parallel algorithms that can solve many interesting applications efficiently using NVidia's CUDA programming framework.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;More over, by Googling "CUDA MapReduce", you will find MapReduce implementations based on CUDA and GPU, developed by researchers at UC Berkeley, U Texas, Hong Kong Univ. of Sci.&amp;amp;Tech, and etc.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;About the supporting hardware, I recently noticed NVidia's &lt;a href="http://www.nvidia.com/object/product_tesla_c1060_us.html"&gt;Tesla processor board&lt;/a&gt;, which contains a 240-core Tesla 10 GPU and 4GB on-board memory.  This card can be installed on workstations like Dell &lt;a href="http://search.dell.com/results.aspx?s=gen&amp;amp;c=us&amp;amp;l=en&amp;amp;cs=&amp;amp;k=T7500&amp;amp;cat=all&amp;amp;x=0&amp;amp;y=0"&gt;Precision T7500&lt;/a&gt;. At the time when this essay is written, the price of such a system is about RMB 43,000.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Last but not least, there are significant differences between GPU-clusters and relatively traditionally computer-clusters.  Few listed here:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;There is no mature load-balancing mechanism on GPU-clusters.  Currently, GPU-based parallel computing is in the early stage of CPU-based parallel computing, which I mean, no automatic balancing over processors used by a task, and no scheduling and balancing over tasks.  This prevents multiple projects from sharing a GPU-cluster.&lt;/li&gt;&lt;li&gt;GPU-cluster is based on shared-memory architecture, so it is suitable only for the class of computing-intensive but data-sparse tasks.  I do not see more than a few problems in real world that fit in this class.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;But anyway, it is not smart to compare GPU-based parallel computing directly with multi-core CPU based solutions, because the latter can be naturally incorporated into multi-computer parallel computing and achieve naturally much higher scalability.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8777245442326858910?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8777245442326858910/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8777245442326858910' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8777245442326858910'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8777245442326858910'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/in-my-previous-post-i-mentioned-newly.html' title='A Small Computing Cluster on Board'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4517121818863214998</id><published>2009-12-28T02:24:00.001-08:00</published><updated>2009-12-28T02:40:43.522-08:00</updated><title type='text'>Using Facebook Thrift</title><content type='html'>While I am looking for a general RPC solution, Thrift comes to my eyes.  It is now an Apache project although developed at Facebook.  Thrift is very similar to Google's protocol buffer, which was open sourced and hosted on Google Code.  For me, Thrift seems support more languages and deployed with a whole set of surrounding support, including thread manager and server/client stubs.   More interestingly, Thrift supports exception, that is, exceptions thrown by remote methods can be caught by client code. (I do not remember that Google protocol buffer support this ....)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, Thrift does not yet have an official tutorial, so, here is a very brief one. (I would resort to an official one once it is published.)&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Download Thrift from &lt;a href="http://incubator.apache.org/thrift"&gt;http://incubator.apache.org/thrift&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Unpack the .tar.gz file to create &lt;tt&gt;/tmp/thrift-0.2.0&lt;/tt&gt;&lt;/li&gt;&lt;li&gt;Configure, build and install &lt;pre&gt;./configure --prefix=~/wyi/thrift-0.2.0 CXXFLAGS='-g -O2'&lt;br /&gt;make&lt;br /&gt;make install&lt;/pre&gt;&lt;/li&gt;&lt;li&gt;Generate source code from tutorial.thrift&lt;pre&gt;cd tutorial&lt;br /&gt;~wyi/thrift/bin/thrift -r --gen cpp tutorial&lt;/pre&gt;Note that the &lt;tt&gt;-r&lt;/tt&gt; flag indicates generating also include files.  The result source code will be placed into a sub-directory named &lt;tt&gt;gen-cpp&lt;/tt&gt;.&lt;/li&gt;&lt;li&gt;Compile example C++ server and client programs in &lt;tt&gt;tutorial/cpp&lt;/tt&gt;&lt;pre&gt;cd cpp&lt;br /&gt;make&lt;/pre&gt;Note that you might want to change the &lt;tt&gt;Makefile&lt;/tt&gt; to tell the lib and include directories where Thrift was installed.&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4517121818863214998?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4517121818863214998/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4517121818863214998' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4517121818863214998'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4517121818863214998'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/using-facebook-thrift.html' title='Using Facebook Thrift'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2345890138707535311</id><published>2009-12-27T08:35:00.000-08:00</published><updated>2010-01-07T20:30:34.812-08:00</updated><title type='text'>A WordCount Tutorial for Hadoop 0.20.1</title><content type='html'>Because the document of Hadoop 0.20.1 describes &lt;a href="http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html#Source+Code"&gt;a tutorial program which uses out-of-date APIs&lt;/a&gt;, I decided to write the following tutorial for Hadoop 0.20.1.  It is notable that in 0.20.1, org.apache.hadoop.mapred.* are deprecated and it is recommended to use org.apache.hadoop.mapreduce.*.  This tutorial is based on the new API.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For how to install and configure Hadoop, you might want to refer to my &lt;a href="http://cxwangyi.blogspot.com/2009/12/configuring-hadoop-on-mac-os-x.html"&gt;previous post&lt;/a&gt;.  After Hadoop is installed, let us create a source code directory and put the following Java source file:&lt;br /&gt;&lt;style type="text/css"&gt;&lt;br /&gt;    &lt;!--       body {         color: #000000;         background-color: #ffffff;       }       .constant {         /* font-lock-constant-face */         color: #5f9ea0;       }       .doc {         /* font-lock-doc-face */         color: #bc8f8f;       }       .function-name {         /* font-lock-function-name-face */         color: #0000ff;       }       .keyword {         /* font-lock-keyword-face */         color: #a020f0;       }       .string {         /* font-lock-string-face */         color: #bc8f8f;       }       .type {         /* font-lock-type-face */         color: #228b22;       }       .variable-name {         /* font-lock-variable-name-face */         color: #b8860b;       }        a {         color: inherit;         background-color: inherit;         font: inherit;         text-decoration: inherit;       }       a:hover {         text-decoration: underline;       }     --&gt;&lt;br /&gt;    &lt;/style&gt;    &lt;pre&gt;&lt;span class="keyword"&gt;package&lt;/span&gt; org.&lt;span class="constant"&gt;sogou&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;java&lt;/span&gt;.&lt;span class="constant"&gt;io&lt;/span&gt;.&lt;span class="type"&gt;IOException&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;java&lt;/span&gt;.&lt;span class="constant"&gt;lang&lt;/span&gt;.&lt;span class="type"&gt;InterruptedException&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;java&lt;/span&gt;.&lt;span class="constant"&gt;util&lt;/span&gt;.&lt;span class="type"&gt;StringTokenizer&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;io&lt;/span&gt;.&lt;span class="type"&gt;IntWritable&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;io&lt;/span&gt;.&lt;span class="type"&gt;Text&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;conf&lt;/span&gt;.&lt;span class="type"&gt;Configuration&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;fs&lt;/span&gt;.&lt;span class="type"&gt;Path&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;mapreduce&lt;/span&gt;.&lt;span class="type"&gt;Job&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;mapreduce&lt;/span&gt;.&lt;span class="type"&gt;Mapper&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;mapreduce&lt;/span&gt;.&lt;span class="type"&gt;Reducer&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;mapreduce&lt;/span&gt;.&lt;span class="constant"&gt;lib&lt;/span&gt;.&lt;span class="constant"&gt;input&lt;/span&gt;.&lt;span class="type"&gt;FileInputFormat&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;mapreduce&lt;/span&gt;.&lt;span class="constant"&gt;lib&lt;/span&gt;.&lt;span class="constant"&gt;output&lt;/span&gt;.&lt;span class="type"&gt;FileOutputFormat&lt;/span&gt;;&lt;br /&gt;&lt;span class="keyword"&gt;import&lt;/span&gt; &lt;span class="constant"&gt;org&lt;/span&gt;.&lt;span class="constant"&gt;apache&lt;/span&gt;.&lt;span class="constant"&gt;hadoop&lt;/span&gt;.&lt;span class="constant"&gt;util&lt;/span&gt;.&lt;span class="type"&gt;GenericOptionsParser&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;&lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;class&lt;/span&gt; &lt;span class="type"&gt;WordCount&lt;/span&gt; {&lt;br /&gt;&lt;span class="doc"&gt;/**&lt;br /&gt; * The map class of WordCount.&lt;br /&gt; */&lt;/span&gt;&lt;br /&gt;&lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="keyword"&gt;class&lt;/span&gt; &lt;span class="type"&gt;TokenCounterMapper&lt;/span&gt;&lt;br /&gt;    &lt;span class="keyword"&gt;extends&lt;/span&gt; &lt;span class="type"&gt;Mapper&lt;/span&gt;&amp;lt;Object, &lt;span class="type"&gt;Text&lt;/span&gt;, &lt;span class="type"&gt;Text&lt;/span&gt;, &lt;span class="type"&gt;IntWritable&lt;/span&gt;&amp;gt; {&lt;br /&gt;        &lt;br /&gt;    &lt;span class="keyword"&gt;private&lt;/span&gt; &lt;span class="keyword"&gt;final&lt;/span&gt; &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="type"&gt;IntWritable&lt;/span&gt; &lt;span class="variable-name"&gt;one&lt;/span&gt; = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;IntWritable&lt;/span&gt;(1);&lt;br /&gt;    &lt;span class="keyword"&gt;private&lt;/span&gt; &lt;span class="type"&gt;Text&lt;/span&gt; &lt;span class="variable-name"&gt;word&lt;/span&gt; = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;Text&lt;/span&gt;();&lt;br /&gt;&lt;br /&gt;    &lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="variable-name"&gt;map&lt;/span&gt;(&lt;span class="type"&gt;Object&lt;/span&gt; &lt;span class="variable-name"&gt;key&lt;/span&gt;, &lt;span class="type"&gt;Text&lt;/span&gt; &lt;span class="variable-name"&gt;value&lt;/span&gt;, &lt;span class="type"&gt;Context&lt;/span&gt; &lt;span class="variable-name"&gt;context&lt;/span&gt;)&lt;br /&gt;        &lt;span class="keyword"&gt;throws&lt;/span&gt; &lt;span class="type"&gt;IOException&lt;/span&gt;, &lt;span class="type"&gt;InterruptedException&lt;/span&gt; {&lt;br /&gt;        &lt;span class="type"&gt;StringTokenizer&lt;/span&gt; &lt;span class="variable-name"&gt;itr&lt;/span&gt; = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;StringTokenizer&lt;/span&gt;(value.toString());&lt;br /&gt;        &lt;span class="keyword"&gt;while&lt;/span&gt; (itr.hasMoreTokens()) {&lt;br /&gt;            word.set(itr.nextToken());&lt;br /&gt;            context.write(word, one);&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;span class="doc"&gt;/**&lt;br /&gt; * The reducer class of WordCount&lt;br /&gt; */&lt;/span&gt;&lt;br /&gt;&lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="keyword"&gt;class&lt;/span&gt; &lt;span class="type"&gt;TokenCounterReducer&lt;/span&gt;&lt;br /&gt;    &lt;span class="keyword"&gt;extends&lt;/span&gt; &lt;span class="type"&gt;Reducer&lt;/span&gt;&amp;lt;Text, &lt;span class="type"&gt;IntWritable&lt;/span&gt;, &lt;span class="type"&gt;Text&lt;/span&gt;, &lt;span class="type"&gt;IntWritable&lt;/span&gt;&amp;gt; {&lt;br /&gt;    &lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="variable-name"&gt;reduce&lt;/span&gt;(&lt;span class="type"&gt;Text&lt;/span&gt; &lt;span class="variable-name"&gt;key&lt;/span&gt;, Iterable&amp;lt;IntWritable&amp;gt; values, &lt;span class="type"&gt;Context&lt;/span&gt; &lt;span class="variable-name"&gt;context&lt;/span&gt;)&lt;br /&gt;        &lt;span class="keyword"&gt;throws&lt;/span&gt; &lt;span class="type"&gt;IOException&lt;/span&gt;, &lt;span class="type"&gt;InterruptedException&lt;/span&gt; {&lt;br /&gt;        &lt;span class="type"&gt;int&lt;/span&gt; &lt;span class="variable-name"&gt;sum&lt;/span&gt; = 0;&lt;br /&gt;        &lt;span class="keyword"&gt;for&lt;/span&gt; (IntWritable value : values) {&lt;br /&gt;            sum += value.get();&lt;br /&gt;        }&lt;br /&gt;        context.write(key, &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;IntWritable&lt;/span&gt;(sum));&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;span class="doc"&gt;/**&lt;br /&gt; * The main entry point.&lt;br /&gt; */&lt;/span&gt;&lt;br /&gt;&lt;span class="keyword"&gt;public&lt;/span&gt; &lt;span class="keyword"&gt;static&lt;/span&gt; &lt;span class="type"&gt;void&lt;/span&gt; &lt;span class="function-name"&gt;main&lt;/span&gt;(&lt;span class="type"&gt;String&lt;/span&gt;[] &lt;span class="variable-name"&gt;args&lt;/span&gt;) &lt;span class="keyword"&gt;throws&lt;/span&gt; &lt;span class="type"&gt;Exception&lt;/span&gt; {&lt;br /&gt;    &lt;span class="type"&gt;Configuration&lt;/span&gt; &lt;span class="variable-name"&gt;conf&lt;/span&gt; = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;Configuration&lt;/span&gt;();&lt;br /&gt;    &lt;span class="type"&gt;String&lt;/span&gt;[] &lt;span class="variable-name"&gt;otherArgs&lt;/span&gt; = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;GenericOptionsParser&lt;/span&gt;(conf, args).getRemainingArgs();&lt;br /&gt;    &lt;span class="type"&gt;Job&lt;/span&gt; &lt;span class="variable-name"&gt;job&lt;/span&gt; = &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;Job&lt;/span&gt;(conf, &lt;span class="string"&gt;"Example Hadoop 0.20.1 WordCount"&lt;/span&gt;);&lt;br /&gt;    job.setJarByClass(WordCount.&lt;span class="keyword"&gt;class&lt;/span&gt;);&lt;br /&gt;    job.setMapperClass(TokenCounterMapper.&lt;span class="keyword"&gt;class&lt;/span&gt;);&lt;br /&gt;    job.setReducerClass(TokenCounterReducer.&lt;span class="keyword"&gt;class&lt;/span&gt;);&lt;br /&gt;    job.setOutputKeyClass(Text.&lt;span class="keyword"&gt;class&lt;/span&gt;);&lt;br /&gt;    job.setOutputValueClass(IntWritable.&lt;span class="keyword"&gt;class&lt;/span&gt;);&lt;br /&gt;    FileInputFormat.addInputPath(job, &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;Path&lt;/span&gt;(otherArgs[0]));&lt;br /&gt;    FileOutputFormat.setOutputPath(job, &lt;span class="keyword"&gt;new&lt;/span&gt; &lt;span class="type"&gt;Path&lt;/span&gt;(otherArgs[1]));&lt;br /&gt;    System.exit(job.waitForCompletion(&lt;span class="constant"&gt;true&lt;/span&gt;) ? 0 : 1);&lt;br /&gt;}&lt;br /&gt;}  &lt;/pre&gt;Then, we build this file and pack the result into a jar file:&lt;br /&gt;&lt;pre&gt;mkdir classes&lt;br /&gt;javac -classpath /Users/wyi/hadoop-0.20.1/hadoop-0.20.1-core.jar:/Users/wyi/hadoop-0.20.1//lib/commons-cli-1.2.jar -d classes WordCount.java &amp;amp;&amp;amp; jar -cvf wordcount.jar -C classes/ .&lt;br /&gt;&lt;/pre&gt;Finally, we run the jar file in standalone mode of Hadoop&lt;br /&gt;&lt;pre&gt;echo "hello world bye world" &gt; /Users/wyi/tmp/in/0.txt&lt;br /&gt;echo "hello hadoop goodebye hadoop" &gt; /Users/wyi/tmp/in/1.txt&lt;br /&gt;hadoop jar wordcount.jar org.sogou.WordCount /Users/wyi/tmp/in /Users/wyi/tmp/out&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2345890138707535311?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2345890138707535311/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2345890138707535311' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2345890138707535311'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2345890138707535311'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/wordcount-tutorial-for-hadoop-0201.html' title='A WordCount Tutorial for Hadoop 0.20.1'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7368576907896919211</id><published>2009-12-27T04:56:00.001-08:00</published><updated>2009-12-27T20:34:30.018-08:00</updated><title type='text'>Install and Configure Hadoop on Mac OS X</title><content type='html'>&lt;b&gt;Download and Install&lt;/b&gt; &lt;ol&gt;&lt;li&gt; Download Hadoop (at the time of writing this essay, it is version 0.20.1) and unpack it into, say, ~wyi/hadoop-0.20.1. &lt;/li&gt;&lt;li&gt; Install JDK 1.6 for Mac OS X. &lt;/li&gt;&lt;li&gt; Edit your ~/.bash_profile to add the following lines&lt;br /&gt;&lt;pre&gt;export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home&lt;br /&gt;export HADOOP_HOME=~wyi/hadoop-0.20.1&lt;br /&gt;export PATH=$HADOOP_HOME/bin:$PATH&lt;/pre&gt;&lt;/li&gt;&lt;li&gt;Edit ~wyi/hadoop-0.20.1/conf/hadoop-env.sh to define JAVA_HOME varialbe as&lt;pre&gt;export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home &lt;/pre&gt;&lt;/li&gt;&lt;li&gt; Try to run the command hadoop &lt;/li&gt;&lt;/ol&gt; &lt;b&gt;Run An Example Program&lt;/b&gt;&lt;br /&gt;By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging. The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.&lt;ul&gt;&lt;pre&gt;cd ~/wyi/hadoop-0.20.1&lt;br /&gt;mkdir input&lt;br /&gt;cp conf/*.xml input&lt;br /&gt;bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'&lt;br /&gt;cat output/*&lt;/pre&gt;&lt;/ul&gt;&lt;div&gt; Note that before you re-run this example, you need to delete directory output, otherwise, Hadoop will complain that directory exists.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7368576907896919211?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7368576907896919211/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7368576907896919211' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7368576907896919211'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7368576907896919211'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/configuring-hadoop-on-mac-os-x.html' title='Install and Configure Hadoop on Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1805319874436082488</id><published>2009-12-27T03:29:00.000-08:00</published><updated>2009-12-27T06:53:24.287-08:00</updated><title type='text'>Some Interesting Ant External Tools/Tasks</title><content type='html'>&lt;b&gt;AntPrettyBuild&lt;/b&gt;                         &lt;p&gt;           Ant Pretty Build is a tool to easily show and run Ant buildfiles directly from         within a browser window. It consists of a single XSL file that will generate,         on the fly, in the browser, from the .xml buildfile, a pretty interface showing         project name, description, properties and targets, etc. sorted or unsorted,           allowing to load/modify/add properties, run the whole project, or run selected         set of targets in a specific order, with the ability to modify logger/logfile,         mode and add more libs or command line arguments.&lt;br /&gt;&lt;/p&gt;&lt;b&gt;Checkstyle       &lt;/b&gt;                         &lt;p&gt;Checkstyle is a development tool to help programmers write         Java code that adheres to a coding standard. Its purpose is to         automate the process of checking Java code, and to spare         humans of this boring (but important) task.&lt;/p&gt;                                 &lt;p&gt;Checkstyle can be run via an Ant task or a command line         utility.&lt;/p&gt;&lt;b&gt;Hammurapi&lt;/b&gt;                         &lt;p&gt;Java code review tool. Performs automated code         review. Contains 111 inspectors which check different aspects         of code quality including coding standards, EJB, threading,         ...&lt;/p&gt;&lt;b&gt;ProGuard&lt;/b&gt;                         &lt;p&gt;&lt;a href="http://proguard.sourceforge.net/"&gt;ProGuard&lt;/a&gt; is         a free Java class file shrinker and obfuscator.  It can detect         and remove unused classes, fields, methods, and attributes. It         can then rename the remaining classes, fields, and methods         using short meaningless names.&lt;/p&gt;&lt;b&gt;CleanImports&lt;/b&gt;                         &lt;p&gt;Removes unneeded imports. Formats your import         sections. Flags ambiguous imports.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1805319874436082488?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1805319874436082488/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1805319874436082488' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1805319874436082488'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1805319874436082488'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/interesting-ant-external-toolstasks.html' title='Some Interesting Ant External Tools/Tasks'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6789304705033284212</id><published>2009-12-24T05:32:00.000-08:00</published><updated>2009-12-24T05:39:14.237-08:00</updated><title type='text'>High-dimensional Data Processing</title><content type='html'>&lt;div&gt;&lt;div&gt;The three top performing classes of algorithms for high-dimensional data sets are &lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;logistic regression, &lt;/li&gt;&lt;li&gt;Random Forests and &lt;/li&gt;&lt;li&gt;SVMs.&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;Although logistic regression can be inferior to non-linear algorithms, e.g. kernel SVMs, for low-dimensional data sets, it often performs equally well in high-dimensions, when the number of features goes over 10000, because &lt;span style="background-color:yellow;"&gt;most data sets become linearly separable when the numbers of features become very large&lt;/span&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Given the fact that logistic regression is often faster to train than more complex models like Random Forests and SVMs, in many situations it is the preferable method to deal with high dimensional data sets.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6789304705033284212?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6789304705033284212/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6789304705033284212' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6789304705033284212'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6789304705033284212'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/high-dimensional-data-processing.html' title='High-dimensional Data Processing'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8248840550553550894</id><published>2009-12-22T22:41:00.000-08:00</published><updated>2009-12-22T22:51:24.910-08:00</updated><title type='text'>Native Multitouch for Linux</title><content type='html'>&lt;div&gt;A research institute in France coworked with Linux kernel developers and created a Linux native muti-point touch tech.  I like the following demo video on Youtube:&lt;/div&gt;&lt;div&gt;  &lt;a href="http://www.youtube.com/watch?v=DTeUbx_nnM4"&gt;http://www.youtube.com/watch?v=DTeUbx_nnM4&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;More details can be found at:&lt;/div&gt;&lt;div&gt;  &lt;a href="http://lii-enac.fr/en/projects/shareit/linux.html"&gt;http://lii-enac.fr/en/projects/shareit/linux.html&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8248840550553550894?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8248840550553550894/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8248840550553550894' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8248840550553550894'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8248840550553550894'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/native-multitouch-for-linux.html' title='Native Multitouch for Linux'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6591098727557576797</id><published>2009-12-22T18:36:00.001-08:00</published><updated>2009-12-22T18:39:48.606-08:00</updated><title type='text'>Skyfire: Mobile Web Browsing over GFW</title><content type='html'>Just tried Skyfire on my Nokia E71 mobile phone.  It is not only able to get over GFW but also able to play Youtube videos (E71 does not have a fully functional Flash player).  A colleague told me that Skyfire renders Web pages (including embedded video) on the sever and sends result images to my mobile phone. It is really thin-client, but I will pay a lot for communication to network carrier.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6591098727557576797?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6591098727557576797/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6591098727557576797' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6591098727557576797'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6591098727557576797'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/skyfire-mobile-web-browsing-over-gfw.html' title='Skyfire: Mobile Web Browsing over GFW'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5330317901294544875</id><published>2009-12-20T21:56:00.000-08:00</published><updated>2009-12-20T22:06:43.074-08:00</updated><title type='text'>Collapsed Gibbs Sampling of LDA on GPU</title><content type='html'>Thanks to Feng Yan who sent me his newly published work on &lt;a href="http://www.blogger.com/www.cs.purdue.edu/homes/taowang/MLseminar/546_paper.pdf"&gt;parallel inference of LDA on GPU&lt;/a&gt;.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The basic motivation is that in the circumstances of GPU, display card memory has too small capacity to maintain a copy of n&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;wk&lt;/span&gt; matrix for each core in GPU.  So the very basic requirement is to keep a global n&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;wk&lt;/span&gt; matrix for all cores.  This brings a new requirement that when multiple cores work together in sampling, they should not update the same element of n&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;wk&lt;/span&gt; simultaneously.  Feng gave a solution to partition the training data by not only documents but also words. This is viable due to the observation that:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;for word w1 in document j1 and word w2 in document j2, if w1!=w2 and j1!=j2, simultaneious updates of topic assignment have no read/write conflicts on document-topic matrix n&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;jk&lt;/span&gt; nor wor-topic matrix n&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;wk&lt;/span&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;Feng also presents a preprocess algorithm which computes an optimal data partition under the goal of load balancing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5330317901294544875?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5330317901294544875/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5330317901294544875' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5330317901294544875'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5330317901294544875'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/collapsed-gibbs-sampling-of-lda-on-gpu.html' title='Collapsed Gibbs Sampling of LDA on GPU'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4915064613691303322</id><published>2009-12-20T18:52:00.001-08:00</published><updated>2009-12-23T02:17:38.088-08:00</updated><title type='text'>A Nice Introduction to Logistic Regression</title><content type='html'>&lt;a href="http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html"&gt;http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html&lt;/a&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Among the many text books and tutorials on logistic regression, the very preliminary one given by above link explains &lt;i&gt;how the logistic regression model comes&lt;/i&gt;:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In the binary classification problem, it is intuitive to determine whether an instance x belongs to class 0 or class 1 by the ratio P(c=1|x) / P(c=0|x).  Denoting P = P(c=1|x) and 1-P = P(c=0|x), the ratio becomes &lt;b&gt;odds&lt;/b&gt; P/(1-P).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, a bad property of odds is that it is asymmetric w.r.t. P.  For example, swapping the values of P and 1-P does not negates the value of P/(1-P).  However, the swapping does negates the &lt;b&gt;logit&lt;/b&gt; ln P/(1-P).  So, it becomes reasonable to make logit instead of odds our dependent variable.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;By modeling the dependent variable by a linear form, we get:&lt;/div&gt;&lt;div style="text-align: center;"&gt;ln P/(1-P) = a + bx&lt;/div&gt;&lt;div&gt;which is equivalent to &lt;/div&gt;&lt;div style="text-align: center;"&gt;P = e&lt;sup&gt;a+bx&lt;/sup&gt; / (1 + e&lt;sup&gt;a+bx&lt;/sup&gt;)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Above tutorial also &lt;i&gt;compares linear regression with logistic regression&lt;/i&gt;:&lt;/div&gt;&lt;div&gt;"If you use linear regression, the predicted values will become greater than one and less than zero if you move far enough on the X-axis. Such values are theoretically inadmissible."&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This explains that logistic regression does not estimate the relation between x and c, instead it estimates x and P(c|x), and uses P(c|x) to determine whether x is in c=1 or c=0.  So logistic regression &lt;b&gt;is not regression&lt;/b&gt;, it is a &lt;b&gt;classifier&lt;/b&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Additional information:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;A C++ implementation of large-scale logistic regression (together with a tech-report) can be found at:&lt;br /&gt;    &lt;a href="http://stat.rutgers.edu/~madigan/BBR"&gt;http://stat.rutgers.edu/~madigan/BBR&lt;/a&gt; &lt;/li&gt;&lt;li&gt;A Mahout slides show that they have received a proposal to implement logistic regression in Hadoop from Google Summer school of Code, but I have not seen the result yet.&lt;/li&gt;&lt;li&gt;Two papers on large-scale logistic regression was published in 2009:&lt;br /&gt;1. &lt;a href="http://siam.org/proceedings/datamining/2009/dm09_107_singhs.pdf"&gt;Parallel Large-scale Feature Selection for Logistic Regression&lt;/a&gt;, and&lt;br /&gt;2. &lt;a href="http://portal.acm.org/citation.cfm?id=1557082"&gt;Large-scale Sparse Logistic Regression&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4915064613691303322?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4915064613691303322/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4915064613691303322' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4915064613691303322'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4915064613691303322'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/12/nice-introduction-to-logistic.html' title='A Nice Introduction to Logistic Regression'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-102215138013164082</id><published>2009-11-23T19:15:00.000-08:00</published><updated>2009-11-23T19:21:09.377-08:00</updated><title type='text'>Clouding Computing Using GPUs</title><content type='html'>&lt;div&gt;It seems that cloud computing and supercomputing have been in the conversion from massive CPUs to massive GPUs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In the most recent Top500 Supercomputer list, although most supercomputers still use CPU and Linux, but one of them, the #5, uses ATI RadeonTM RV770 GPUs instead of x86 CPUs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In a well-known blog post, How Will We Keep Supercomputing Super?, the author claims that due to the limitations stated by the Moore's law, the complexity of CPU makes it costly (in frontend complexity and power consumption) to combine more of them to achieve better performance. On the contrast, it is more technologically reasonable to combine GPUs, which contains more and simpler cores than CPUs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It is also noticeable that the biggest GPU producer, Nvidia, recently launched their cloud computing product, RealityServer, on Amazon's cloud computing platform.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-102215138013164082?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/102215138013164082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=102215138013164082' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/102215138013164082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/102215138013164082'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/11/clouding-computing-using-gpus.html' title='Clouding Computing Using GPUs'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4632487250788402109</id><published>2009-11-02T02:12:00.000-08:00</published><updated>2009-11-02T02:13:39.703-08:00</updated><title type='text'>How to Write a Spelling Correction Program</title><content type='html'>&lt;a href="http://norvig.com/spell-correct.html"&gt;This&lt;/a&gt; is an excellent article by Peter Norvig, a research director of Google.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4632487250788402109?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4632487250788402109/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4632487250788402109' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4632487250788402109'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4632487250788402109'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/11/how-to-write-spelling-correction.html' title='How to Write a Spelling Correction Program'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2320992627489172686</id><published>2009-11-01T18:30:00.000-08:00</published><updated>2009-11-01T18:31:14.205-08:00</updated><title type='text'>Emacs — Tab vs. Space</title><content type='html'>To force Emacs to insert spaces instead of tabs when you press the TAB key:&lt;br /&gt;&lt;pre&gt;M-x set-variable&lt;ret&gt; indent-tabs-mode&lt;ret&gt; nil&lt;/ret&gt;&lt;/ret&gt;&lt;/pre&gt;Or in your .emacs file:&lt;br /&gt;&lt;pre&gt;(setq-default indent-tabs-mode nil)&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2320992627489172686?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2320992627489172686/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2320992627489172686' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2320992627489172686'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2320992627489172686'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/11/emacs-tab-vs-space.html' title='Emacs — Tab vs. Space'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8241893344546482775</id><published>2009-10-20T03:39:00.000-07:00</published><updated>2009-10-20T03:41:40.368-07:00</updated><title type='text'>C++ digraphs and additional keywords</title><content type='html'>[&lt;a href="http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.vacpp6m.doc/compiler/ref/ruoptdig.htm"&gt;Original post&lt;/a&gt;]&lt;br /&gt;&lt;h4&gt;&lt;a name="Header_288"&gt;&lt;/a&gt;&lt;/h4&gt; &lt;p&gt;A digraph is a keyword or combination of keys that lets you produce a character that is not available on all keyboards. &lt;/p&gt;&lt;p&gt;The digraph key combinations are: &lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;table&gt; &lt;tbody&gt;&lt;tr&gt; &lt;th id="COL1" valign="top" width="50%" align="center"&gt;Key Combination &lt;/th&gt;&lt;th id="COL2" valign="top" width="50%" align="center"&gt;Character Produced &lt;/th&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;% &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;{ &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;%&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;} &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;: &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;[ &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;:&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;] &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;%% &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;# &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; &lt;/blockquote&gt; &lt;p&gt;Additional keywords, valid in C++ programs only, are: &lt;/p&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;table&gt; &lt;tbody&gt;&lt;tr&gt; &lt;th id="COL1" valign="top" width="50%" align="center"&gt;Keyword &lt;/th&gt;&lt;th id="COL2" valign="top" width="50%" align="center"&gt;Character Produced &lt;/th&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;bitand&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;&amp;amp; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;and&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;&amp;amp;&amp;amp; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;bitor&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;| &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;or&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;|| &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;xor&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;^ &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;compl&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;~ &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;and_eq&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;&amp;amp;= &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;or_eq&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;|= &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;xor_eq&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;^= &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;not&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;! &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td headers="COL1" valign="top" width="50%" align="center"&gt;&lt;tt&gt;not_eq&lt;/tt&gt; &lt;/td&gt;&lt;td headers="COL2" valign="top" width="50%" align="center"&gt;!= &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; &lt;/blockquote&gt; &lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8241893344546482775?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8241893344546482775/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8241893344546482775' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8241893344546482775'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8241893344546482775'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/10/c-digraphs-and-additional-keywords.html' title='C++ digraphs and additional keywords'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-9042752098270974143</id><published>2009-10-01T23:01:00.000-07:00</published><updated>2009-10-01T23:04:20.270-07:00</updated><title type='text'>To Make Firefox Display PDF on Mac OS X</title><content type='html'>On Windows, we can simply install Adobe Reader and Firefox will be able to find PDF plugin.  However, on Mac OS X, we need to install the PDF Browser Plugin from &lt;a href="http://www.schubert-it.com/pluginpdf/"&gt;Schubert|it&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-9042752098270974143?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/9042752098270974143/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=9042752098270974143' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/9042752098270974143'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/9042752098270974143'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/10/to-make-firefox-display-pdf-on-mac-os-x.html' title='To Make Firefox Display PDF on Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2273676173903743742</id><published>2009-09-29T05:39:00.000-07:00</published><updated>2009-09-29T05:44:11.765-07:00</updated><title type='text'>VLHMM for Web Applications</title><content type='html'>It is glad to find that a WWW'09 paper cited my work on VLHMM (variable-length hidden Markov model).  In this paper, &lt;a href="http://www2009.org/proceedings/pdf/p191.pdf"&gt;&lt;span style="font-style: italic;"&gt;Towards Context-Aware Search by Learning a Very Large Variable Length Hidden Markov Model from Search Logs&lt;/span&gt;&lt;/a&gt;, the authors propose to learn a very-large VLHMM for Web user behavior modeling.  I also re-visited my old &lt;a href="http://dbgroup.cs.tsinghua.edu.cn/wangyi/VLHMM/"&gt;Web page on VLHMM&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2273676173903743742?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2273676173903743742/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2273676173903743742' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2273676173903743742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2273676173903743742'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/09/vlhmm-for-web-applications.html' title='VLHMM for Web Applications'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7874351944109671121</id><published>2009-09-27T19:47:00.000-07:00</published><updated>2009-09-27T19:48:06.498-07:00</updated><title type='text'>Merge columns from two text files</title><content type='html'>pr -m -t -s\  file1 file2&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7874351944109671121?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7874351944109671121/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7874351944109671121' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7874351944109671121'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7874351944109671121'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/09/merge-columns-from-two-text-files.html' title='Merge columns from two text files'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5134304307776999241</id><published>2009-08-31T03:46:00.001-07:00</published><updated>2009-08-31T03:47:40.300-07:00</updated><title type='text'>Posting Code into Blogger Posts</title><content type='html'>A concise &lt;a href="http://pleasemakeanote.blogspot.com/2008/06/posting-source-code-in-blogger.html"&gt;article&lt;/a&gt; describes how to use SyntaxHighLighter to insert program code snippets into Blogger posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5134304307776999241?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5134304307776999241/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5134304307776999241' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5134304307776999241'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5134304307776999241'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/08/blog-post.html' title='Posting Code into Blogger Posts'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6232761411183593995</id><published>2009-08-31T03:34:00.000-07:00</published><updated>2009-08-31T03:37:33.699-07:00</updated><title type='text'>Bloom Filter</title><content type='html'>The &lt;a href="http://en.wikipedia.org/wiki/Bloom_filter"&gt;Bloom filter&lt;/a&gt; is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter). The more elements that are added to the set, the larger the probability of false positives.&lt;br /&gt;&lt;br /&gt;&lt;span style="display: block;" id="formatbar_Buttons"&gt;&lt;span class="" style="display: block;" id="formatbar_CreateLink" title="Link" onmouseover="ButtonHoverOn(this);" onmouseout="ButtonHoverOff(this);" onmouseup="" onmousedown="CheckFormatting(event);FormatbarButton('richeditorframe', this, 8);ButtonMouseDown(this);"&gt;Practical applicaitons of Bloom filter including fast test that whether a request could be handled by a server instance, whether a data element is in a replicate in a redundent system.&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6232761411183593995?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6232761411183593995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6232761411183593995' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6232761411183593995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6232761411183593995'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/08/bloom-filter.html' title='Bloom Filter'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7298896559509954719</id><published>2009-08-12T02:17:00.000-07:00</published><updated>2009-08-31T22:59:59.721-07:00</updated><title type='text'>Be Careful with stl::accumulate</title><content type='html'>If we are to accumulate a vector of doubles, use the following code snippet:&lt;br /&gt;&lt;verb&gt;&lt;br /&gt;accumulate(timbre_topic_dist.begin(), timbre_topic_dist.end(), 0.0);&lt;br /&gt;&lt;/verb&gt;&lt;br /&gt;You do not want to be lazy and write 0.0 as 0, which will be interpreted by the compiler as an integer, which is used to infer the type of intermediate and final result of accumulate.  For your reference, here attaches one of the multiple prototypes of accumulate:&lt;br /&gt;&lt;pre&gt;      template &amp;lt typename _InputIterator, typename _Tp &amp;gt&lt;br /&gt;    _Tp accumulate(_InputIterator __first, _InputIterator __last, _Tp __init) {&lt;br /&gt;      for (; __first != __last; ++__first)&lt;br /&gt;        __init = __init + *__first;&lt;br /&gt;      return __init;&lt;br /&gt;    }    &lt;/pre&gt;Note that the partial result is stored in &lt;tt&gt;_Tp __init&lt;/tt&gt;, which means even we explicitly use &lt;tt&gt;plus&lt;double&gt;&lt;/double&gt;&lt;/tt&gt; as the accumulator, the result will still be truncated.&lt;br /&gt;&lt;verb&gt;&lt;br /&gt;accumulate(timbre_topic_dist.begin(), timbre_topic_dist.end(), 0, // Wrong&lt;br /&gt;  plus&lt;double&gt;()); // No effect to correct the mistake.&lt;br /&gt;&lt;/double&gt;&lt;br /&gt;&lt;/verb&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7298896559509954719?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7298896559509954719/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7298896559509954719' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7298896559509954719'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7298896559509954719'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/08/be-careful-with-stlaccumulate.html' title='Be Careful with stl::accumulate'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2895277551710588099</id><published>2009-07-21T19:52:00.000-07:00</published><updated>2009-07-21T20:14:02.329-07:00</updated><title type='text'>A Paper on ISMIR 2008</title><content type='html'>In the following paper published on ISMIR 2008:&lt;ul&gt;&lt;li&gt;&lt;a href="http://ismir2008.ismir.net/papers/ISMIR2008_211.pdf"&gt;Oh Oh Oh Whoah! Towards Automatic Topic Detection In Song Lyrics&lt;/a&gt;, Florian Kleedorfer et al.&lt;/li&gt;&lt;/ul&gt;the authors present their work using NMF (Non-negative Matrix Factorization) to analyze semantic topics from song lyrics.  In Section 3.4:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt; "We decide to use NMF for automatic topic detection as it is a clustering technique that results in additive representation of items (e.g., song X is represented as 10% topci A, 30% topic B and 60% topic C), a property that distinguishes it from most other clustering techniques."&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;However, "most other techniques" including pLSA, LDA and Mix-Noisy-OR models all have the "distinguishing property" stated by the authors.  In addition, the equivalence between NMF and pLSA has been well studied in the following papers:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf"&gt;Relation between PLSA and NMF and implications&lt;/a&gt;, SIGIR 2006.&lt;/li&gt;&lt;li&gt;&lt;a href="http://ranger.uta.edu/~chqding/papers/NMFpLSIequiv.pdf"&gt;On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing&lt;/a&gt;. Computational Statistics and Data Analysis 52 (2008) 3913–3927.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;The authors also criticize that LSA cannot process large sparse matrices.  However, LSA is in fact applying SVD on term-document-matrix (TDM), and there are many SVD algorithms that can decompose large sparse matrices.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2895277551710588099?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2895277551710588099/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2895277551710588099' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2895277551710588099'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2895277551710588099'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/07/paper-on-ismir-2008.html' title='A Paper on ISMIR 2008'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2223439389922057267</id><published>2009-07-15T08:21:00.000-07:00</published><updated>2009-07-15T08:29:08.010-07:00</updated><title type='text'>A Non-Sense Braille Translator</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_shbz2iI4vAY/Sl3024zRoKI/AAAAAAAAEJg/icDQGRB_opA/s1600-h/sample.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: left; cursor: pointer;" src="http://4.bp.blogspot.com/_shbz2iI4vAY/Sl3024zRoKI/AAAAAAAAEJg/icDQGRB_opA/s400/sample.png" alt="" id="BLOGGER_PHOTO_ID_5358708355438321826" border="0" /&gt;&lt;/a&gt;&lt;div align="center"&gt;&lt;tt&gt;An apple everyday, keeps doctor away.&lt;/tt&gt;&lt;/div&gt;&lt;br /&gt;I recently wrote a Braille translator, which converts English text into Braille rendered using pretty dots --- only people with well eye-sight can percieve.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2223439389922057267?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2223439389922057267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2223439389922057267' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2223439389922057267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2223439389922057267'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/07/non-sense-braille-translator.html' title='A Non-Sense Braille Translator'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_shbz2iI4vAY/Sl3024zRoKI/AAAAAAAAEJg/icDQGRB_opA/s72-c/sample.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7272564228417598103</id><published>2009-07-08T04:09:00.001-07:00</published><updated>2010-01-19T19:30:47.217-08:00</updated><title type='text'>Lock Screen in Mac OS X</title><content type='html'>To enable the "Lock Screen" function of Mac OS X, open your Keychain Access utility in the Applications / Utility folder. In the preference dialog, select "Show Status in Menu Bar." A black padlock will appear in your taskbar in the upper right-hand corner. Close Keychain Access. Now when you click on the padlock, you have a "Lock Screen" option in the drop-down.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7272564228417598103?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7272564228417598103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7272564228417598103' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7272564228417598103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7272564228417598103'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/07/lock-screen-in-mac-os-x_08.html' title='Lock Screen in Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4069307596105931420</id><published>2009-07-08T04:09:00.000-07:00</published><updated>2009-07-08T04:10:17.899-07:00</updated><title type='text'>Lock Screen in Mac OS X</title><content type='html'>To enable the "Lock Screen" function of Mac OS X, open your Keychain Access utility in the Applications / Utility folder. In the preference dialog, select "Show Status in Menu Bar." A black padlock will appear in your taskbar in the upper right-hand corner. Close Keychain Access. Now when you click on the padlock, you have a "Lock Screen" option in the drop-down.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4069307596105931420?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4069307596105931420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4069307596105931420' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4069307596105931420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4069307596105931420'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/07/lock-screen-in-mac-os-x.html' title='Lock Screen in Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-3256450895755180415</id><published>2009-06-29T20:09:00.000-07:00</published><updated>2009-06-29T20:14:08.907-07:00</updated><title type='text'>解决英文版Windows的中文乱码问题</title><content type='html'>&lt;ol&gt;&lt;li&gt;Open "Control Panel"&lt;/li&gt;&lt;li&gt;Switch to "Category View"&lt;/li&gt;&lt;li&gt;Open "Regional and Language Options"&lt;/li&gt;&lt;li&gt;Switch to "Languages" tab&lt;/li&gt;&lt;li&gt;Click "Install files for East Asian languages", and click "Apply".&lt;/li&gt;&lt;li&gt;Switch to "Advanced" tab&lt;/li&gt;&lt;li&gt;In the combo box "Language for non-Unicode programs", select "Chinese（PRC）"&lt;/li&gt;&lt;li&gt;Switch to "Regional Options" tab&lt;/li&gt;&lt;li&gt;Select "Chinese(PRC)" and "China" respectively for each of the two combo boxes.&lt;/li&gt;&lt;li&gt;Click "OK"&lt;/li&gt;&lt;/ol&gt;Tips: 如果原有的中文软件还是有乱码问题，可以重新安装程序。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-3256450895755180415?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/3256450895755180415/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=3256450895755180415' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3256450895755180415'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3256450895755180415'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/windows.html' title='解决英文版Windows的中文乱码问题'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6916646915628744704</id><published>2009-06-29T00:10:00.000-07:00</published><updated>2009-06-29T00:11:01.496-07:00</updated><title type='text'>Learning OpenCV, the E-book</title><content type='html'>&lt;a href="http://dbgroup.cs.tsinghua.edu.cn/wangyi/misc/OReilly-LearningOpenCV.pdf"&gt;&lt;span style="font-style: italic;"&gt;Learning OpenCV&lt;/span&gt;&lt;/a&gt;, the E-book&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6916646915628744704?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6916646915628744704/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6916646915628744704' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6916646915628744704'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6916646915628744704'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/learning-opencv-e-book.html' title='Learning OpenCV, the E-book'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-616557014947906927</id><published>2009-06-28T20:42:00.000-07:00</published><updated>2009-06-28T23:37:52.675-07:00</updated><title type='text'>DocView Mode for Emacs</title><content type='html'>The &lt;a href="http://www.emacswiki.org/emacs/doc-view.el"&gt;wiki&lt;/a&gt; page.&lt;br /&gt;&lt;tt&gt;Shift-+&lt;/tt&gt; to enlarge the displaying (.pdf)&lt;br /&gt;&lt;tt&gt;C-c C-c&lt;/tt&gt; to switch between docview mode and text mode&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-616557014947906927?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/616557014947906927/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=616557014947906927' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/616557014947906927'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/616557014947906927'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/docview-mode-for-emacs-is-great.html' title='DocView Mode for Emacs'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4307559821433182433</id><published>2009-06-23T02:46:00.000-07:00</published><updated>2009-06-23T18:48:25.317-07:00</updated><title type='text'>Make Emacs Warns for Long Lines</title><content type='html'>On Linux, we can use function &lt;tt&gt;font-lock-set-up-width-warning&lt;/tt&gt; to tell Emacs warning for too long lines in source code:&lt;br /&gt;&lt;pre&gt;(add-hook 'c++-mode-hook&lt;br /&gt;      '(lambda () (font-lock-set-up-width-warning 80)))&lt;br /&gt;(add-hook 'java-mode-hook&lt;br /&gt;      '(lambda () (font-lock-set-up-width-warning 80)))&lt;br /&gt;(add-hook 'python-mode-hook&lt;br /&gt;      '(lambda () (font-lock-set-up-width-warning 80)))&lt;br /&gt;&lt;/pre&gt;On Mac OS X, above method fails.  However, With Carbon Emacs or Aquamacs, we can do as follows:&lt;br /&gt;&lt;pre&gt;; for CarbonEmacs (MacOSX)&lt;br /&gt;(defun font-lock-width-keyword (width)&lt;br /&gt;"Return a font-lock style keyword for a string beyond width WIDTH&lt;br /&gt;thatuses 'font-lock-warning-face'."&lt;br /&gt;`((,(format "^%s\\(.+\\)" (make-string width ?.))&lt;br /&gt; (1 font-lock-warning-face t))))&lt;br /&gt;&lt;br /&gt;(font-lock-add-keywords 'c++-mode (font-lock-width-keyword 80))&lt;br /&gt;(font-lock-add-keywords 'objc-mode (font-lock-width-keyword 80))&lt;br /&gt;(font-lock-add-keywords 'python-mode (font-lock-width-keyword 80))&lt;br /&gt;(font-lock-add-keywords 'java-mode (font-lock-width-keyword 80))&lt;br /&gt;&lt;/pre&gt;An easier solution for 80-column-rule is &lt;a href="http://www.helsinki.fi/%7Esjpaavol/programs/lineker.el"&gt;lineker&lt;/a&gt;.  The usage is pretty simple (tested on my IBM T60p, Emacs for Windows): add the following into your &lt;tt&gt;.emacs&lt;/tt&gt; file.&lt;br /&gt;&lt;pre&gt;(require 'lineker)&lt;br /&gt;(add-hook 'c-mode-hook 'lineker-mode)&lt;br /&gt;(add-hook 'c++-mode-hook 'lineker-mode)&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4307559821433182433?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4307559821433182433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4307559821433182433' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4307559821433182433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4307559821433182433'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/make-emacs-warns-for-long-lines.html' title='Make Emacs Warns for Long Lines'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-356704652051772026</id><published>2009-06-23T00:10:00.000-07:00</published><updated>2009-06-23T00:21:09.140-07:00</updated><title type='text'>GCC Does Not Support Mutable Set/MultiSet Iterator</title><content type='html'>Although C++ STL standard requires that &lt;tt&gt;std::set&lt;/tt&gt; and &lt;tt&gt;set::multiset&lt;/tt&gt; supports both constant iterator and mutable iterator, but libstdc++ supports only the constant one.  In &lt;tt&gt;/usr/include/c++/4.0.0/bits/stl_set.h&lt;/tt&gt; (as well &lt;tt&gt;stl_multiset.h&lt;/tt&gt;), we can see the following iterator typedefs:&lt;br /&gt;&lt;pre&gt;// _GLIBCXX_RESOLVE_LIB_DEFECTS&lt;br /&gt;// DR 103. set::iterator is required to be modifiable,&lt;br /&gt;// but this allows modification of keys.&lt;br /&gt;typedef typename _Rep_type::const_iterator iterator;&lt;br /&gt;typedef typename _Rep_type::const_iterator const_iterator;&lt;/pre&gt;This would lead many STL algorithms incompatible with set and multiset.  For example, the following code does not compile in GCC 4.0.x:&lt;br /&gt;&lt;pre&gt;set&lt;int&gt; myset;             // or multiset&lt;int&gt; myset;&lt;br /&gt;*myset.begin() = 100;  // fails due to begin() returns const_iterator&lt;br /&gt;remove_if(myset.begin(), myset.end(), Is71()); // remove_if invokes remove_copy_if, which requires mutable myset.begin().&lt;/int&gt;&lt;/int&gt;&lt;/pre&gt;It is notable that Microsoft Visual C++ 7.0 and later versions are more restrictive to the STL standard on above issue.  Above code works with Visual C++.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-356704652051772026?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/356704652051772026/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=356704652051772026' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/356704652051772026'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/356704652051772026'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/gcc-does-not-support-mutable.html' title='GCC Does Not Support Mutable Set/MultiSet Iterator'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6712885845200128435</id><published>2009-06-22T08:35:00.000-07:00</published><updated>2009-06-22T08:38:38.875-07:00</updated><title type='text'>Fast Approximate of 2D Water Ripples</title><content type='html'>&lt;a href="http://freespace.virgin.net/hugo.elias/graphics/x_water.htm"&gt;Here&lt;/a&gt; is a very fast approximation algorithm.  The author is really good at observing and finding highly effective approximations, in this case, the value of sine(x+o) is proportional to sine(x), where o is a small phase offset.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6712885845200128435?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6712885845200128435/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6712885845200128435' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6712885845200128435'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6712885845200128435'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/fast-approximate-algorithm-for.html' title='Fast Approximate of 2D Water Ripples'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6869637011761674592</id><published>2009-06-21T07:41:00.000-07:00</published><updated>2009-06-21T07:50:34.649-07:00</updated><title type='text'>Useful Documents for CUDA Development</title><content type='html'>&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: georgia;"&gt;&lt;a href="http://dbgroup.cs.tsinghua.edu.cn/wangyi/resources/NVIDIA_CUDA_Programming_Guide_2.2.pdf"&gt;Official Programming Guide&lt;/a&gt; from NVidia.&lt;/span&gt;&lt;span style="font-family: georgia;"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: georgia;"&gt;&lt;a href="http://dbgroup.cs.tsinghua.edu.cn/wangyi/resources/CUDA_Programming.pdf"&gt;CUDA Programming&lt;/a&gt;, a slides from Johan Seland.&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6869637011761674592?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6869637011761674592/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6869637011761674592' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6869637011761674592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6869637011761674592'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/useful-documents-for-cuda-development.html' title='Useful Documents for CUDA Development'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2953077941176115239</id><published>2009-06-21T06:40:00.001-07:00</published><updated>2009-06-21T07:27:56.090-07:00</updated><title type='text'>Make CUDA Works on MacBook Pro</title><content type='html'>&lt;ol&gt;&lt;li&gt;Download and install CUDA toolkit and CUDA SDK.&lt;/li&gt;&lt;li&gt;When installing the CUDA toolkit, click the &lt;span style="font-style: italic;"&gt;Customize&lt;/span&gt; button on the Installation Type panel of the installer. Then be sure that CUDAKext is selected for installation. If we do not do this, CUDA applications will complain "no CUDA capable device".&lt;/li&gt;&lt;li&gt;After installing add the following to .bash_profile.&lt;br /&gt;&lt;pre&gt;export PATH=/usr/local/cuda/bin:$PATH&lt;br /&gt;export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH&lt;/pre&gt;&lt;/li&gt;&lt;li&gt;After installing the CUDA SDK,&lt;br /&gt;&lt;pre&gt;cd /Developer/CUDA/lib&lt;br /&gt;ranlib *.lib&lt;/pre&gt;Otherwise, we will get the following linker error when building CUDA applications:&lt;br /&gt;&lt;tt&gt;ld: in ../../lib/libcutil.a, archive has no table of contents&lt;/tt&gt;&lt;/li&gt;&lt;li&gt;Build CUDA sample applications:&lt;br /&gt;&lt;pre&gt;cd /Developer/CUDA&lt;br /&gt;make&lt;/pre&gt;The result application binaries will be installed to &lt;tt&gt;/Developer/CUDA/bin/darwin/release&lt;/tt&gt;.&lt;/li&gt;&lt;/ol&gt;A useful reference for installation and system requirement is a PDF file named &lt;tt&gt;CUDA_Getting_Started_2.2_MacOS.pdf&lt;/tt&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2953077941176115239?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2953077941176115239/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2953077941176115239' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2953077941176115239'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2953077941176115239'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/make-cuda-works-on-macbook-pro.html' title='Make CUDA Works on MacBook Pro'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-5374715657383210159</id><published>2009-06-21T06:25:00.000-07:00</published><updated>2009-06-21T06:28:46.857-07:00</updated><title type='text'>A Good Xcode/C++/QuickDraw Tutorial</title><content type='html'>&lt;a href="http://kknapp.sd38.ca/IT_Tutorial_Units/Apple/Xcode_CPP/xcindex.html"&gt;Xcode C++  Tutorials&lt;/a&gt; is a C++ tutorial using Xcode as the development tool and QuickDraw (the 2D graphics engine of Carbon under Mac OS X) as the basic framework.  It is good for C++ beginners using Mac OS X, as well developers with experience with other platforms.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-5374715657383210159?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/5374715657383210159/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=5374715657383210159' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5374715657383210159'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/5374715657383210159'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/good-xcodecquickdraw-tutorial.html' title='A Good Xcode/C++/QuickDraw Tutorial'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4869372294613881103</id><published>2009-06-20T21:49:00.001-07:00</published><updated>2009-06-20T21:49:50.343-07:00</updated><title type='text'>Mix Intel IPP with OpenCV</title><content type='html'>http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-open-source-computer-vision-library-opencv-faq/&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4869372294613881103?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4869372294613881103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4869372294613881103' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4869372294613881103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4869372294613881103'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/mix-intel-ipp-with-opencv.html' title='Mix Intel IPP with OpenCV'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-3739841494213776322</id><published>2009-06-20T19:44:00.000-07:00</published><updated>2009-06-20T21:40:53.690-07:00</updated><title type='text'>Build OpenCV under Mac OS X</title><content type='html'>Install OpenCV on Mac OS X&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Install Xcode on Mac OS X computer.&lt;/li&gt;&lt;li&gt;Download OpenCV source package (&lt;span style="font-weight: bold;"&gt;for Linux&lt;/span&gt;) from SourceForge.  &lt;/li&gt;&lt;li&gt;Unpack the source package&lt;/li&gt;&lt;li&gt;Generate Makefile.in/am&lt;br /&gt;&lt;tt&gt;autoreconf -i --force&lt;/tt&gt;&lt;/li&gt;&lt;li&gt;Configure&lt;br /&gt;&lt;tt&gt;./configure --prefix=/usr/local --with-python --with-swig&lt;/tt&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Build and test building result&lt;br /&gt;&lt;tt&gt;make&lt;/tt&gt;&lt;br /&gt;&lt;tt&gt;make check&lt;/tt&gt;&lt;/li&gt;&lt;li&gt;Install&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family: courier new;"&gt;make install&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Building applications&lt;br /&gt; &lt;span style="font-size:85%;"&gt;&lt;span style="font-family: courier new;"&gt;g++ -o capcam main.cc -I /usr/local/include/opencv -L/usr/local/lib -lcxcore -lcv -lcvaux -lhighgui -lml&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;Other sources of information:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.cs.iit.edu/%7Eagam/cs512/lect-notes/opencv-intro/"&gt;Introduction to programming with OpenCV&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-3739841494213776322?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/3739841494213776322/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=3739841494213776322' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3739841494213776322'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3739841494213776322'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/build-opencv-under-mac-os-x.html' title='Build OpenCV under Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7024344210929320923</id><published>2009-06-20T18:48:00.000-07:00</published><updated>2009-06-20T18:53:47.642-07:00</updated><title type='text'>Package Management under Mac OS X</title><content type='html'>Under Windows, we can use Cygwin to manage software packages.  Under Mac OS X, we can use Macports.  Download and install the Macports dmg file, open a terminal window, and type commands like:&lt;br /&gt;&lt;pre&gt;sudo port install cmake&lt;/pre&gt;Macports will download the source package and compile it for you.&lt;br /&gt;&lt;br /&gt;Note: To make Macports knows the most recent package list, type the following commands regularly:&lt;br /&gt;&lt;pre&gt;sudo port -v selfupdate&lt;/pre&gt;&lt;br /&gt;Note: After installing Macports, open a &lt;span style="font-weight: bold;"&gt;new&lt;/span&gt; terminal windows (program), which will use the system environment variables newly updated by the installation program.  Using a terminal window which had been opened before Macports installation will leads to an error complaining cannot find 'port'.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7024344210929320923?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7024344210929320923/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7024344210929320923' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7024344210929320923'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7024344210929320923'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/package-management-under-mac-os-x.html' title='Package Management under Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1788438202412413439</id><published>2009-06-18T22:28:00.000-07:00</published><updated>2009-06-18T22:31:16.748-07:00</updated><title type='text'>ActionScript 3.0 Mode for Emacs</title><content type='html'>&lt;a href="http://www.emacswiki.org/emacs/ActionScriptMode"&gt;EmacsWiki&lt;/a&gt; suggests an actionscript mode which can be downloaded from a &lt;a href="http://blog.pettomato.com/?p=24"&gt;post&lt;/a&gt; at PetTomato.  After download the .el file, I added the following lines into my .emacs configuration file.  This works for my Aquaemacs for Mac OS X.&lt;br /&gt;&lt;pre&gt;(load-file "~/.emacs.d/actionscript-mode.el")&lt;br /&gt;(autoload 'actionscript-mode "javascript" nil t)&lt;br /&gt;(add-to-list 'auto-mode-alist '("\\.as\\'" . actionscript-mode))&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1788438202412413439?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1788438202412413439/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1788438202412413439' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1788438202412413439'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1788438202412413439'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/actionscript-30-mode-for-emacs.html' title='ActionScript 3.0 Mode for Emacs'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2227614252405309196</id><published>2009-06-14T19:46:00.000-07:00</published><updated>2009-06-14T19:47:54.200-07:00</updated><title type='text'>Gmsh: a three-dimensional finite element mesh generator</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Times; "&gt;&lt;a href="http://www.geuz.org/gmsh/"&gt;Gmsh&lt;/a&gt; is an automatic 3D finite element grid generator with a built-in CAD engine and post-processor. Its design goal is to provide a simple meshing tool for academic problems with parametric input and advanced visualization capabilities.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2227614252405309196?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2227614252405309196/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2227614252405309196' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2227614252405309196'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2227614252405309196'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/gmsh-three-dimensional-finite-element.html' title='Gmsh: a three-dimensional finite element mesh generator'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4607563497003392715</id><published>2009-06-13T20:19:00.000-07:00</published><updated>2009-06-13T20:21:56.437-07:00</updated><title type='text'>Fullscreen Mode of Aquamacs</title><content type='html'>&lt;div&gt;We can type &lt;span class="Apple-style-span"  style="font-family:'lucida grande';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Command-Shift-Return&lt;/span&gt;&lt;/span&gt; to invoke the function &lt;span class="Apple-style-span" style="font-family: 'lucida grande';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;aquamacs-toggle-full-frame&lt;/span&gt;&lt;/span&gt;, which switches between fullscreen mode.  For your customization skills for Aquamacs, refer to &lt;span class="Apple-style-span"  style="font-family:'lucida grande';"&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;&lt;a href="http://www.emacswiki.org/emacs/CustomizeAquamacs"&gt;http://www.emacswiki.org/emacs/CustomizeAquamacs&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4607563497003392715?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4607563497003392715/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4607563497003392715' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4607563497003392715'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4607563497003392715'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/fullscreen-mode-of-aquamacs.html' title='Fullscreen Mode of Aquamacs'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7080632412581739444</id><published>2009-06-13T20:12:00.000-07:00</published><updated>2009-06-13T20:14:50.237-07:00</updated><title type='text'>Robust Ray Intersect with Triangle Test</title><content type='html'>&lt;div&gt;&lt;div&gt;Tomas Moller and Ben Trumbore proposed a robust algorithm in their paper, &lt;span class="Apple-style-span" style="font-style: italic;"&gt;Fast, Minimum Storage Ray/Triangle Intersection&lt;/span&gt;.  A reference implementation of this algorithm can be referred to as &lt;a href="http://www.lighthouse3d.com/opengl/maths/index.php?raytriint"&gt;http://www.lighthouse3d.com/opengl/maths/index.php?raytriint&lt;/a&gt;, which implements the non-culling branch presented in that paper.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7080632412581739444?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7080632412581739444/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7080632412581739444' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7080632412581739444'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7080632412581739444'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/robust-ray-intersect-with-triangle-test.html' title='Robust Ray Intersect with Triangle Test'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6662373682826260150</id><published>2009-06-13T20:06:00.000-07:00</published><updated>2009-06-13T20:09:46.068-07:00</updated><title type='text'>Alt and Meta in Aquamacs</title><content type='html'>&lt;span class="Apple-style-span"  style=" ;font-family:Times;"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;The default Meta key in the Aquamacs distribution is Option (and also Esc). If this is unusable for you (your fingers are too well trained on other platforms), you can either press Apple-; (Options → Option Key → Option Key for Meta) to switch to Esc only.&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Times;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style=" ;font-family:Times;"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;For other interesting things about Aquamacs, refer to &lt;/span&gt;&lt;a href="http://www.emacswiki.org/emacs/AquamacsFAQ"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;Aquamacs FAQ&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6662373682826260150?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6662373682826260150/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6662373682826260150' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6662373682826260150'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6662373682826260150'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/alt-and-meta-in-aquamacs.html' title='Alt and Meta in Aquamacs'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4155339482878148068</id><published>2009-06-10T09:42:00.000-07:00</published><updated>2009-06-10T09:44:08.878-07:00</updated><title type='text'>Point inside Polyhedron</title><content type='html'>From the book, &lt;a href="http://books.google.com/books?id=WGpL6Sk9qNAC&amp;amp;pg=PA206&amp;amp;lpg=PA206&amp;amp;dq=Point+inside+polyhedron&amp;amp;source=bl&amp;amp;ots=Pl1QmK-heP&amp;amp;sig=N0zwakzL0x_UAjE_wWsqJzeXg7c&amp;amp;hl=en&amp;amp;ei=z-AvSsjxIJqytAP6heDXCA&amp;amp;sa=X&amp;amp;oi=book_result&amp;amp;ct=result&amp;amp;resnum=8"&gt;Real-time Collision Detection&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4155339482878148068?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4155339482878148068/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4155339482878148068' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4155339482878148068'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4155339482878148068'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/point-inside-polyhedron.html' title='Point inside Polyhedron'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-3532648461042087440</id><published>2009-06-09T19:37:00.000-07:00</published><updated>2009-06-09T19:47:21.941-07:00</updated><title type='text'>Password-less Login Using SSH</title><content type='html'>Consider we want to login from a MacBook Pro (mbp) to a remote Linux machine (tsingyi), where both computers have OpenSSH installed.  In order to make tsingyi trust mbp, we use RSA cryptographic method to generate a public and a private key for mbp, which will be used to identify mbp during login.&lt;br /&gt;&lt;br /&gt;To generate the pair of keys, on mbp, type&lt;br /&gt;&lt;pre&gt;ssh-keygen -t rsa&lt;br /&gt;&lt;/pre&gt;Accept all default answers, and we get two files:&lt;br /&gt;&lt;pre&gt;~/.ssh/id_rsa     --- the private key&lt;br /&gt;~/.ssh/id_rsa.pub --- the public key&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;Now, copy the public key file to tsingyi by typing following command on mbp:&lt;br /&gt;&lt;pre&gt;scp ~/.ssh/id_rsa.pub wyi@tsingyi:/home/wyi/.ssh/id_rsa-mbp.pub&lt;br /&gt;&lt;/pre&gt;and add the public key of mbp to ~/.ssh/authorized_keys of tsingyi by typing following command on tsingyi:&lt;br /&gt;&lt;pre&gt;cat ~/.ssh/id_rsa-mbp.pub &gt;&gt; ~/.ssh/authorized_keys&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here we are.  We should be able to ssh to tsingyi from mbp without typing password now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-3532648461042087440?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/3532648461042087440/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=3532648461042087440' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3532648461042087440'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/3532648461042087440'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/password-less-login-using-ssh.html' title='Password-less Login Using SSH'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1991089730307769837</id><published>2009-06-09T18:50:00.000-07:00</published><updated>2009-06-09T19:04:15.786-07:00</updated><title type='text'>Buld OpenGL/GLUT Applications under Mac OS X</title><content type='html'>Xcode comes with all what we need to build a Cocoa application with OpenGL and &lt;span class="__mozilla-findbar-search" style="padding: 0pt; background-color: yellow; display: inline;font-size:inherit;color:black;"  &gt;GLUT&lt;/span&gt; and GCC.  So, first of all, we need to install Xcode.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Writing Code&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;An &lt;span class="__mozilla-findbar-search" style="padding: 0pt; background-color: yellow; display: inline;font-size:inherit;color:black;"  &gt;GLUT&lt;/span&gt; C/C++ program for Mac OS X should include three header files:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(204, 51, 204);"&gt;#include &lt;/span&gt;&lt;opengl h=""&gt;&lt;span style="color: rgb(204, 51, 204);"&gt;  // Header File For The OpenGL32 Library&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(204, 51, 204);"&gt;#include &lt;/span&gt;&lt;opengl h=""&gt;&lt;span style="color: rgb(204, 51, 204);"&gt;  // Header File For The GLu32 Library&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(204, 51, 204);"&gt;#include &lt;/span&gt;&lt;glut h=""&gt;&lt;span style="color: rgb(204, 51, 204);"&gt;  // Header File For The &lt;/span&gt;&lt;span class="__mozilla-findbar-search" style="padding: 0pt; background-color: yellow; display: inline; color: rgb(204, 51, 204);font-size:inherit;color:black;"  &gt;GLut&lt;/span&gt;&lt;span style="color: rgb(204, 51, 204);"&gt; Library&lt;/span&gt;&lt;br /&gt;&lt;/glut&gt;&lt;/opengl&gt;&lt;/opengl&gt;&lt;/pre&gt;Note that the locations of these header files in Mac OS X differs from where they are in Linux and Windows (e.g., GL/&lt;span class="__mozilla-findbar-search" style="padding: 0pt; background-color: yellow; display: inline;font-size:inherit;color:black;"  &gt;glut&lt;/span&gt;.h)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Building Using GCC&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The command line using GCC to build a program main.c is as follows:&lt;br /&gt;&lt;pre&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;gcc -framework GLUT -framework OpenGL -framework Cocoa main.c -o learning&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;It is notable here that MacOSX uses the concept of so called &lt;span style="font-style: italic; color: rgb(153, 0, 0);"&gt;frameworks&lt;/span&gt;. Instead of adding include paths and library names yourself, you add a framework to your compiler call. This is a MacOSX specific extension to gcc.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Building Using Xcode IDE&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We can also manage our OpenGL/GLUT projects using Xcode IDE.  To create a project, select the project type of "Cocoa Application".  To add/edit the code, remove the auto-generated main.m, add a new main.c, and write our code into main.c.  To specify the frameworks in IDE, right click the project and choose "Add Exisiting Frameworks" to add OpenGL and GLUT.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1991089730307769837?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1991089730307769837/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1991089730307769837' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1991089730307769837'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1991089730307769837'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/buld-openglglut-applications-under-mac.html' title='Buld OpenGL/GLUT Applications under Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-6187862796279961273</id><published>2009-06-08T22:50:00.000-07:00</published><updated>2009-06-08T22:51:10.283-07:00</updated><title type='text'>Open Source Software on Mac OS X</title><content type='html'>http://www.opensourcemac.org/&lt;br /&gt;http://www.freemacware.com/&lt;br /&gt;http://www.linuxbeacon.com/doku.php?id=opensourcemac&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-6187862796279961273?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/6187862796279961273/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=6187862796279961273' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6187862796279961273'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/6187862796279961273'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/open-source-software-on-mac-os-x.html' title='Open Source Software on Mac OS X'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-734520683615011017</id><published>2009-06-07T08:42:00.001-07:00</published><updated>2009-06-07T08:42:51.267-07:00</updated><title type='text'>Simulation of Cracks</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: medium;"&gt;The &lt;a href="http://www.mathematik.uni-ulm.de/stochastik/personal/schmidt/publications/PVCVT_paper.pdf"&gt;paper&lt;/a&gt;: Simulation of the typical Poisson-Voronoi-Cox-Voronoi cell&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-734520683615011017?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/734520683615011017/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=734520683615011017' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/734520683615011017'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/734520683615011017'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/simulation-of-cracks.html' title='Simulation of Cracks'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-4870057798411698206</id><published>2009-06-07T08:16:00.001-07:00</published><updated>2009-06-07T08:30:38.214-07:00</updated><title type='text'>Multivariate Poisson Models</title><content type='html'>&lt;ol&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;The &lt;/span&gt;&lt;a href="http://www.stat-athens.aueb.gr/~karlis/multivariate%20Poisson%20models.pdf"&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;slides&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt; on Multivariate Poisson Models.&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 20.5px Helvetica"&gt;&lt;span class="Apple-style-span"  style="font-family:georgia;"&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;The &lt;/span&gt;&lt;a href="http://filebox.vt.edu/users/pasupath/papers/shin-pasupathyR3.pdf"&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;paper&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;: An Algorithm for Fast Generation of Bivariate Poisson Random Vectors&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-4870057798411698206?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/4870057798411698206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=4870057798411698206' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4870057798411698206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/4870057798411698206'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/multivariate-poisson-models.html' title='Multivariate Poisson Models'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-1887442094638687295</id><published>2009-06-07T08:11:00.000-07:00</published><updated>2009-06-07T23:14:02.556-07:00</updated><title type='text'>Crystalization</title><content type='html'>&lt;ol&gt;&lt;li&gt;Stochastic and Deterministic Simulation of Nonisothermal Crystalization of Polymers&lt;/li&gt;&lt;li&gt;&lt;a href="http://departments.kings.edu/chemlab/vrml/index.html"&gt;The Structure of Crystals&lt;/a&gt;.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-1887442094638687295?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/1887442094638687295/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=1887442094638687295' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1887442094638687295'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/1887442094638687295'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/06/crystalization.html' title='Crystalization'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-7018287027100248671</id><published>2009-05-27T18:53:00.000-07:00</published><updated>2009-06-22T08:34:27.935-07:00</updated><title type='text'>To Get Forward (Alt-f), Backward (Alt-b) and Delete (Alt-d) Word Works for iTerm</title><content type='html'>&lt;ol&gt;&lt;li&gt;Open iTerm.&lt;/li&gt;&lt;li&gt;Go to Bookmarks &gt; Manage Profiles&lt;/li&gt;&lt;li&gt;Choose Keyboard Profiles on the left and edit the Global Profile&lt;/li&gt;&lt;li&gt;Next to Mapping, click the + sign.&lt;/li&gt;&lt;li&gt;For Key, choose hex code.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;In the text box next to hex code, enter 0x62 for b, 0x64 for d or 0x66 for f.&lt;br /&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;Note that 0x62, 0x64 and 0x66 are ASCII codes for characters b, d, and f respectively&lt;/span&gt;.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;For Modifier, check the Option Box&lt;/li&gt;&lt;li&gt;For Action, choose send escape sequence.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Write b, d or f in the input field.&lt;/li&gt;&lt;/ol&gt;Now Alt-f will jump forward a word, Alt-b jumps backwards a word, and Alt-d deletes a word.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-7018287027100248671?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/7018287027100248671/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=7018287027100248671' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7018287027100248671'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/7018287027100248671'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/05/iterm-tip-to-get-forward-alt-f-and.html' title='To Get Forward (Alt-f), Backward (Alt-b) and Delete (Alt-d) Word Works for iTerm'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-2047415937310524403</id><published>2009-05-14T06:31:00.001-07:00</published><updated>2009-05-14T06:34:29.522-07:00</updated><title type='text'>Insert Google Maps and Google Picasa Album into Blog Posts</title><content type='html'>&lt;iframe marginheight="0" marginwidth="0" src="http://ditu.google.cn/maps?f=q&amp;amp;source=s_q&amp;amp;hl=zh-CN&amp;amp;geocode=&amp;amp;q=%E5%8C%97%E4%BA%AC%E6%B8%85%E5%8D%8E%E7%A7%91%E6%8A%80%E5%9B%AD%0D%0A&amp;amp;sll=24.915712,121.673937&amp;amp;sspn=1.93537,2.801514&amp;amp;ie=UTF8&amp;amp;brcurrent=3,0x35f05296e7142cb9:0xb9625620af0fa98a%3B5,0&amp;amp;ll=39.89922,116.38342&amp;amp;spn=0.23849,0.300673&amp;amp;output=embed" scrolling="no" width="425" frameborder="0" height="350"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;small&gt;&lt;a href="http://ditu.google.cn/maps?f=q&amp;amp;source=embed&amp;amp;hl=zh-CN&amp;amp;geocode=&amp;amp;q=%E5%8C%97%E4%BA%AC%E6%B8%85%E5%8D%8E%E7%A7%91%E6%8A%80%E5%9B%AD%0D%0A&amp;amp;sll=24.915712,121.673937&amp;amp;sspn=1.93537,2.801514&amp;amp;ie=UTF8&amp;amp;brcurrent=3,0x35f05296e7142cb9:0xb9625620af0fa98a%3B5,0&amp;amp;ll=39.89922,116.38342&amp;amp;spn=0.23849,0.300673" style="color: rgb(0, 0, 255); text-align: left;"&gt;查看大图&lt;/a&gt;&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;&lt;table style="width: 194px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="background: transparent url(http://picasaweb.google.com/s/c/transparent_album_background.gif) no-repeat scroll left center; height: 194px; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;" align="center"&gt;&lt;a href="http://picasaweb.google.com/Yi.Wang.2005/eqUIyI?feat=embedwebsite"&gt;&lt;img src="http://lh6.ggpht.com/_shbz2iI4vAY/SJ76RwLZCME/AAAAAAAACXY/VbBobhv-VYc/s160-c/eqUIyI.jpg" style="margin: 1px 0pt 0pt 4px;" width="160" height="160" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center; font-family: arial,sans-serif; font-size: 11px;"&gt;&lt;a href="http://picasaweb.google.com/Yi.Wang.2005/eqUIyI?feat=embedwebsite" style="color: rgb(77, 77, 77); font-weight: bold; text-decoration: none;"&gt;妈妈做的绢花&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-2047415937310524403?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/2047415937310524403/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=2047415937310524403' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2047415937310524403'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/2047415937310524403'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/05/insert-google-maps-and-google-picasa.html' title='Insert Google Maps and Google Picasa Album into Blog Posts'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_shbz2iI4vAY/SJ76RwLZCME/AAAAAAAACXY/VbBobhv-VYc/s72-c/eqUIyI.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6983561823392851671.post-8564145776719946731</id><published>2009-03-18T05:36:00.001-07:00</published><updated>2009-03-18T05:36:28.784-07:00</updated><title type='text'>MATLAB code for Sampling Gaussian distribution</title><content type='html'>&lt;span style="font-family: courier new;"&gt;function M = sample_gaussian(mu, Sigma, N)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;mu = mu(:);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;n=length(mu);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;[U,D,V] = svd(Sigma);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;M = randn(n,N);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;M = (U*sqrt(D))*M + mu*ones(1,N); &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;M = M';&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6983561823392851671-8564145776719946731?l=cxwangyi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cxwangyi.blogspot.com/feeds/8564145776719946731/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6983561823392851671&amp;postID=8564145776719946731' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8564145776719946731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6983561823392851671/posts/default/8564145776719946731'/><link rel='alternate' type='text/html' href='http://cxwangyi.blogspot.com/2009/03/matlab-code-for-sampling-gaussian.html' title='MATLAB code for Sampling Gaussian distribution'/><author><name>Yi Wang</name><uri>http://www.blogger.com/profile/13574770689478996483</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_shbz2iI4vAY/SS1K0yGuZPI/AAAAAAAADCQ/lKEtzVSFb_Q/S220/head-only.png'/></author><thr:total>3</thr:total></entry></feed>
