Apr 19, 2010

Running Hadoop on Mac OS X (Single Node)

I installed Hadoop, built its C++ components, and built and ran Pipes programs on my iMac running Snow Leopard.

Installation and Configuration

Basically, I followed Michael G. Noll's guide, Running Hadoop On Ubuntu Linux (Single-Node Cluster), with two things different from the guide.

In Mac OS X, we need to choose to use Sun's JVM. This can be done using System Preference. Then In both .bash_profile and $HADOOP_HOME/conf/hadoop-env.sh, set the JAVA_HOME environment variable:
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home

I did not create special account for running Hadoop. (I should, for security reasons, but I am lazy and my iMac is only for personal development, but not real computing...) So, I need to chmod a+rwx /tmp/hadoop-yiwang, where yiwang is my account name, as well what ${user.name} refers to in core-site.xml.

After finishing installation and configuration, we should be able to start all Hadoop services, build and run Hadoop Java programs, and monitor their activities.

Building C++ Components

Because I do nothing about Java, I write Hadoop programs using Pipes. The following steps build Pipes C++ library in Mac OS X:
  1. Install XCode and open a terminal window
  2. cd $HADOOP_HOME/src/c++/utils
  3. ./configure
  4. make install
  5. cd $HADOOP_HOME/src/c++/pipes
  6. ./configure
  7. make install
Note that you must build utils before pipes.

Build and Run Pipes Programs

The following command shows how to link to Pipes libraries:
g++ -o wordcount wordcount.cc \
-I${HADOOP_HOME}/src/c++/install/include \
-L${HADOOP_HOME}/src/c++/install/lib \
-lhadooputils -lhadooppipes -lpthread
To run the program, we need a configuration file, as shown by Apache Hadoop Wiki page.

Build libHDFS

There are some bugs in libHDFS of Apache Hadoop 0.20.2, but it is easy to fix them:
cd hadoop-0.20.2/src/c++/libhdfs
./configure
Remove #include "error.h" from hdfsJniHelper.c
Remove -Dsize_t=unsigned int from Makefile
make
cp hdfs.h ../install/include/hadoop
cp libhdfs.so ../install/lib
Since Mac OS X uses DYLD to mange shared libraries, you need to specify the directory holding libhdfs.so using environment variable DYLD_LIBRARY_PATH. (LD_LIBRARY_PATH does not work.):
export DYLD_LIBRARY_PATH=$HADOOP_HOME/src/c++/install/lib:$DYLD_LIBRARY_PATH
You might want to add above line into your shell configure file (e.g., ~/.bash_profile).

10 comments:

Unknown said...

Great post, very clear instructions. Thank you very much.

One comment:
Should this line towards the end of your instructions:

Remove #include from hdfsJniHelper.c

look like this:

Remove #include "error.h" from hdfsJniHelper.c

or something along those lines?

Yi Wang said...

@Joe Buck: Thanks for the comment. Updated. :-)

Unknown said...

Hi,
I was trying to install libhdfs. It is created successfully using your procedure but program gives error of "libhdfs.so.0" not found. I have added path for folder containing library in both library ahe include. It is not giving any syntax error when I use functions associated with it but it gives run time error.

I am using Ubuntu OS.

Also I have executed a cmd for environment variable. I am not sure in which file I should I add line.

Can nay one help me with this.

Thanks,
Aparna

mareddyonline said...

Wow what a weblog i am so happy to here can you more discuss here on hadoop, i am back again again to your site as soon as possible and i have lot of collection for you just click here for more information.
Hadoop Training in Hyderabad

Hadoop online training said...

Hi,
good content to viewers hadoop experts provides best training on
hadoop online training
by real time experienced experts

Unknown said...

Thanks for sharing this here. It was very useful to me.


Hadoop Training Chennai
Hadoop Training in Chennai

Unknown said...

Thanks for sharing your informative article on Hive ODBC Driver. Your article is very descriptive and assists me to learn whole concept in detail. Hadoop Training in Chennai

Unknown said...

Thanks for your wonderful post. I am very happy to read your post.

Embedded Systems Course in Chennai

Unknown said...

Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

Hadoop Training in Chennai | Big Data Hadoop Training in Chennai | Hadoop Course in Chennai | Hadoop training institutes in chennai

Arjun kumar said...

I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

Software testing training in chennai | Testing courses in chennai | Manual testing training in Chennai