Install Hadoop 2 on Raspberry Pi

Hadoop can be installed on various hardware platforms. It can run even on small devices like Raspberry Pi. Setting up Raspberry to run Hadoop is not a rocket science task. Keep reading to find out how you can set up Hadoop 2 with Yarn.

This post will show you how to run Hadoop 2 on a single-node in a pseudo-distributed mode. When in pseudo distributed mode each daemon runs in a separate Java process.

We will install Hadoop by completing following tasks.

  1. Prepare Raspberry Linux for Hadoop
  2. Download and extract Hadoop distribution files.
  3. Modify Hadoop configuration.
  4. Format HDFS
  5. Start Hadoop daemons
  6. Run a test application

Task 1: Prepare Raspberry Linux for Hadoop

Java Installation

Make sure Java is installed on your Raspberry

$ java version
java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) Client VM (build 25.0-b70, mixed mode)

Configure SSH Server

Hadoop needs a paswordless access to all nodes. First verify you can ssh localhost without passphrase prompt.

$ ssh localhost

If you are asked for a passphrase, you need to configure passphraseless ssh access.

$ ssh-keygen -t dsa -P ''
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Task 2: Download and Extract Hadoop Distribution Files

First create a directory for downloaded files.

$ mkdir ~/install

Download Hadoop version 2.6.0 from Apache mirror.

$ cd ~/install
$ wget http://apache.cbox.biz/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

Extract distribution files.

$ cd /opt
$ sudo tar xvzf ~/install/hadoop-2.6.0.tar.gz

For convenience create a a symlink /opt/hadoop.

$ lnk -s /opt/hadoop-2.6.0.tar.gz /opt/hadoop

Create a directory for hadoop hdfs files:

$ sudo mkdir /hdfs

Task 3: Modify Hadoop Configuration

Edit /etc/profile with a text editor:

$ sudo vi /etc/profile

Add following lines at the end of the file:

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export HADOOP_INSTALL=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

Apply the changes:

source /etc/profile

Hadoop configuration files are stored in $HADOOP_INSTALL/etc/hadoop.

Edit the hadoop-env.sh

$ vi /opt/hadoop/etc/hadoop/hadoop-env.sh

Set the JAVA_HOME environment variable:

# The java implementation to use.
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

The default Hadoop heap size is 1000 MB. Set it to 250.

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=250

Append -client to the HADOOP_DATANODE_OPTS variable.

export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS -client"

Save the changes and exit the text editor.

etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
     <property>
          <name>hadoop.tmp.dir</name>
          <value>/hdfs/tmp-2-6-0</value>
     </property>
</configuration>

etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

Task 4: Format HDFS

$ bin/hdfs namenode -format

Task 5: Start Hadoop Daemons

To start Hadoop, you need to start HDFS and Yarn daemons.

$ start-dfs.sh
$ start-yarn.sh

Verify all the daemons are running:

$ jps
16496 NameNode
16754 SecondaryNameNode
3283 Jps
16996 ResourceManager
16599 DataNode
17098 NodeManager

Task 6: Run a Test Map-Reduce Application

Prepare source data and HDFS directories.

$ cd $HADOOP_INSTALL
$ hdfs dfs -mkdir -p /user/pi/input
$ hdfs dfs -put LICENSE.txt input/license.txt

Run the wordcount map-reduce application.

$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount input output

It takes a while to finish the execution.

Inspect the result:

$  hdfs dfs -cat output/*

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s