- Hadoop Pipes, which provides a C++ library pair to support Hadoop programs in C/C++ only, and
- Hadoop Streamining, which languages any executable files in map/reduce worker processes, and thus support any languages.
However, in Hadoop 0.20.1, the support to Pipes, known as Java code in package org.apache.hadoop.mapred.pipes have been marked deprecated. So I guess Hadoop 0.20.1 has not port to fully support Pipes. Some other posts in forums also discussed this issue.
cat input_file | map_program | sort | reduce_program
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.20.1-streaming.jar \
-file ./word_count_mapper -mapper ./word_count_mapper \
-file ./word_count_reducer -reducer ./word_count_reducer \
-input ./input/*.txt -output
I have created a Google Code project to host this simple implementation: Hadoop Streaming MapReduce, and imported the code using the following command line:
svn import hadoop-streaming-mapreduce/ https://hadoop-stream-mapreduce.googlecode.com/svn/trunk -m 'Initial import'. So you should be able to checkout the code now.
2 comments:
Yi Wang,
Thanks for the instructions. Is there a README file to explain how I actually compile this code using the configure.ac and makefile.am? I've never used the output of automake before. Thanks.
Just wanted to say thanks for this blog post, this has helped a lot for a proof-of-concept I am working on :)
Post a Comment