Install Latest Hadoop 2.4.1 on Ubuntu 14.04

This post is about Hadoop 2.4.1 (latest stable version) Single Node Cluster Installation on Ubuntu 14.04. The Hadoop Single Node Cluster Installation method are suitable only for beginners practicing. The Hadoop Multi Node Cluster is more suitable for production environment Since Hadoop is meant for distributed environment. The Multi Node Cluster Hadoop Installation steps are almost as same as Single Node Cluster Hadoop Installation with few configuration changes. My next post will cover the Multi Node Cluster Hadoop Installation.

The Hadoop installation requires the pre-require steps like installing Java & ssh, creating dedicated Linux user account, disabling IPv6, generating key-gen for user account. This pre-require steps requires for Hadoop node(s) communication by Secure Shell Protocol(ssh). Disable IPv6 requires since Hadoop doesn't support for IPv6.

Step 1: Install Oracle Java JDK7 on Ubuntu.

Step 2: Install SSH-Server

To install Open ssh server, execute the following command in terminal
```
sudo apt-get install openssh-server
```

Step 3: Create "hduser" user account under "hadoop" user group.

Execute the following command to create user group and user account. Give all required information while creating user account and give password as "hduser".
```
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo
```

Step 4: Disable IPv6

Open the "/etc/sysctl.conf" file with gedit as sudo user.
```
sudo gedit /etc/sysctl.conf
```

Configure to disable the IPv6 by modifying/adding the following lines at last.

#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Save & close the file then reboot the machine.
```
sudo shutdown -r now
```
Check the configuration
```
sudo sysctl -p
```

Step 5: Generate key-gen for hduser.

Execute the following command and give no password or key while generating key-gen

su - hduser
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Now check the ssh for localhost
```
su - hduser
ssh localhost
```
If the above ssh is not working properly, we need to check and redo our Step 4 and Step 5 properly again.

Step 6: Download, Extract and Move Hadoop to hduser Home.

Execute the following commands on terminal to download Hadoop and extract to hduser home directory

su - hduser
cd /home/hduser/
wget http://apache.osuosl.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1.tar.gz
tar -zxvf hadoop-2.4.1.tar.gz
mv hadoop-2.4.1 hadoop

Step 7: Configure Environment Variable.

Open "$HOME/.bashrc" file with gedit as hduser.
```
gedit $HOME/.bashrc
```

Add/Modify the following Environment Variables.

export HADOOP_PREFIX=/home/hduser/hadoop
export JAVA_HOME=/usr/java/jdk1.7.0_51
export PATH=$PATH:$HADOOP_PREFIX/bin:$JAVA_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_PREFIX/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"

Save & Close the file then Execute the bash file:
```
exec bash
```
Check the Path environment variable:
```
$PATH
```

Step 8: Configure Hadoop Environment variable

Open "/home/hduser/hadoop/etc/hadoop/hadoop-env.sh" file with gedit as hduser.
```
gedit /home/hduser/hadoop/etc/hadoop/hadoop-env.sh
```
Add/Modify the following Environment Variable Path
```
export JAVA_HOME=/usr/java/jdk1.7.0_51
```
Save & Close the file

Step 9: Configure "core-site.xml" file for working temporary directory and name for File System.

Create a temp directory at hduser home
```
mkdir /home/hduser/tmp
```
Open "/home/hduser/hadoop/etc/hadoop/core-site.xml" flle with gedit as hduser
```
gedit /home/hduser/hadoop/etc/hadoop/core-site.xml
```

Add the following configurations in "core-site.xml" file. Then Save & Close the file

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hduser/tmp</value>
        <description>Base temporary directories</description>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:54310</value>
        <description>Default file system name</description>
    </property>
</configuration>

Step 10: Configure "hdfs-site.xml" file for data directory and replication.

Create a data direcotry directory
```
mkdir /home/hduser/tmp/dfs/data
```
Open "/home/hduser/hadoop/etc/hadoop/hdfs-site.xml" flle with gedit as hduser
```
gedit /home/hduser/hadoop/etc/hadoop/hdfs-site.xml
```

Add the following configurations in "hdfs-site.xml" file. Then Save & Close the file

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/hduser/tmp/dfs/data</value>
    </property>
</configuration>

Step 11: Configure "mapred-site.xml" file for Host and Port number of Job Tracker.

Open "/home/hduser/hadoop/etc/hadoop/mapred-site.xml" file with gedit as hduser
```
gedit /home/hduser/hadoop/etc/hadoop/mapred-site.xml
```

Add the following configurations in "mapred-site.xml" file. Then Save & Close the file

<configuration>
 <property>
        <name>mapred.job.tracker</name>
        <value>localhost:54311</value>
        <description>Host,port for MapReduce Job Tracker</description>
    </property>
</configuration>

Step 12: Configure "yarn-site.xml" file for node manager and resource manager configurations.(Configuring yarn is optional)

Open "/home/hduser/hadoop/etc/hadoop/yarn-site.xml" file with gedit as hduser
```
gedit /home/hduser/hadoop/etc/hadoop/yarn-site.xml
```

Add the following configurations in "yarn-site.xml" file. Then Save & Close the file

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:8025</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8050</value>
    </property>
</configuration>

Step 13: Format namenode and start Hadoop dfs

Execute the following command to format namenode and to start HDFS

/home/hduser/hadoop/bin/hdfs namenode -format
/home/hduser/hadoop/sbin/start-dfs.sh

List Hadoop Nodes
```
jps
```
It will list all tasks running like NameNode, DataNode, SecondaryNameNode, JobTracker, TaskTracker

The Hadoop Installation is completed, Now we can use Hadoop.

Install Hadoop 2.5.1 on Windows 7 - 64Bit Operating System

This post is about installing Single Node Cluster Hadoop 2.5.1 (latest stable version) on Windows 7 Operating Systems.Hadoop was primarily designed for Linux platform. Hadoop supports for windows from its version 2.2, but we need prepare our platform binaries. Hadoop official website recommend Windows developers to use this build for development environment and not on production, since it is not completely tested success on Windows platform. This post describes the procedure for generating the Hadoop build for Windows platform. Generating Hadoop Build For Windows Platform Step 1:Install Microsoft Windows SDK 7.1 In my case, I have used Windows 7 64 bit Operating System. Download Microsoft Windows SDK 7.1 from Microsoft Official website and install it. While installing Windows SDK,I have faced problem like C++ 2010 Redistribution is already installed. This problem will happen only if we have installed C++ 2010 Redistribution of higher version compared to the Windows SDK. ...

Harish Shan

Search This Blog

Install Latest Hadoop 2.4.1 on Ubuntu 14.04

Labels

Comments

Post a Comment

Popular posts from this blog

How to fix Kindle wrong time left in chapter book

Install Spring Tool Suite on Ubuntu

Install Hadoop 2.5.1 on Windows 7 - 64Bit Operating System