Apr 19, 2010

Running Hadoop on Mac OS X (Single Node)

I installed Hadoop, built its C++ components, and built and ran Pipes programs on my iMac running Snow Leopard.

Installation and Configuration

Basically, I followed Michael G. Noll's guide, Running Hadoop On Ubuntu Linux (Single-Node Cluster), with two things different from the guide.

In Mac OS X, we need to choose to use Sun's JVM. This can be done using System Preference. Then In both .bash_profile and $HADOOP_HOME/conf/hadoop-env.sh, set the JAVA_HOME environment variable:
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home

I did not create special account for running Hadoop. (I should, for security reasons, but I am lazy and my iMac is only for personal development, but not real computing...) So, I need to chmod a+rwx /tmp/hadoop-yiwang, where yiwang is my account name, as well what ${user.name} refers to in core-site.xml.

After finishing installation and configuration, we should be able to start all Hadoop services, build and run Hadoop Java programs, and monitor their activities.

Building C++ Components

Because I do nothing about Java, I write Hadoop programs using Pipes. The following steps build Pipes C++ library in Mac OS X:
  1. Install XCode and open a terminal window
  2. cd $HADOOP_HOME/src/c++/utils
  3. ./configure
  4. make install
  5. cd $HADOOP_HOME/src/c++/pipes
  6. ./configure
  7. make install
Note that you must build utils before pipes.

Build and Run Pipes Programs

The following command shows how to link to Pipes libraries:
g++ -o wordcount wordcount.cc \
-I${HADOOP_HOME}/src/c++/install/include \
-L${HADOOP_HOME}/src/c++/install/lib \
-lhadooputils -lhadooppipes -lpthread
To run the program, we need a configuration file, as shown by Apache Hadoop Wiki page.

Build libHDFS

There are some bugs in libHDFS of Apache Hadoop 0.20.2, but it is easy to fix them:
cd hadoop-0.20.2/src/c++/libhdfs
Remove #include "error.h" from hdfsJniHelper.c
Remove -Dsize_t=unsigned int from Makefile
cp hdfs.h ../install/include/hadoop
cp libhdfs.so ../install/lib
Since Mac OS X uses DYLD to mange shared libraries, you need to specify the directory holding libhdfs.so using environment variable DYLD_LIBRARY_PATH. (LD_LIBRARY_PATH does not work.):
You might want to add above line into your shell configure file (e.g., ~/.bash_profile).


Joe Buck said...

Great post, very clear instructions. Thank you very much.

One comment:
Should this line towards the end of your instructions:

Remove #include from hdfsJniHelper.c

look like this:

Remove #include "error.h" from hdfsJniHelper.c

or something along those lines?

Yi Wang said...

@Joe Buck: Thanks for the comment. Updated. :-)

Aparna said...

I was trying to install libhdfs. It is created successfully using your procedure but program gives error of "libhdfs.so.0" not found. I have added path for folder containing library in both library ahe include. It is not giving any syntax error when I use functions associated with it but it gives run time error.

I am using Ubuntu OS.

Also I have executed a cmd for environment variable. I am not sure in which file I should I add line.

Can nay one help me with this.


sundara rami reddy said...

Wow what a weblog i am so happy to here can you more discuss here on hadoop, i am back again again to your site as soon as possible and i have lot of collection for you just click here for more information.
Hadoop Training in Hyderabad

ramya parvathaneni said...

good content to viewers hadoop experts provides best training on
hadoop online training
by real time experienced experts

Hadoop online training said...

very helpful content in your site the hadoop experienced experts provides best online training on
hadoop online training
with real time

Paul Miller said...

Thanks for sharing this here. It was very useful to me.

Hadoop Training Chennai
Hadoop Training in Chennai

kovalan Jayamurugan said...

Thanks for sharing your informative article on Hive ODBC Driver. Your article is very descriptive and assists me to learn whole concept in detail. Hadoop Training in Chennai

Vinoth Kumar said...

Thanks for your wonderful post. I am very happy to read your post.

Embedded Systems Course in Chennai

Vinoth Kumar said...

Really awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog. Keep update your blog.

PLC Training in Chennai

Harshita said...

Hello admin, thank you for your informative post on hadoop training in Chennai. It helped a lot in training my students during our hadoop training Chennai sessions. We at Fita, provide big data training in Chennai for students who are interested in choosing a career in big data.

Andrew Son said...

Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

Hadoop Training in Chennai | Big Data Hadoop Training in Chennai | Hadoop Course in Chennai | Hadoop training institutes in chennai

Arjun kumar said...

I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

Software testing training in chennai | Testing courses in chennai | Manual testing training in Chennai

Anna said...

Great and Useful Article.

Online Java Training | Java Training in Chennai | Java Training Institutes in Chennai | Online Java Course

Dhivya Shree said...

Salesforce.com is an american company which offfers CRM based cloud services and it is loved globally for it quality services
salesforce training in chennai|salesforce training institute in chennai | salesforce course in chennai