Skip to main content

Install Latest Hadoop 2.4.1 on Ubuntu 14.04

This post is about Hadoop 2.4.1 (latest stable version) Single Node Cluster Installation on Ubuntu 14.04. The Hadoop Single Node Cluster Installation method are suitable only for beginners practicing. The Hadoop Multi Node Cluster is more suitable for production environment Since Hadoop is meant for distributed environment. The Multi Node Cluster Hadoop Installation steps are almost as same as Single Node Cluster Hadoop Installation with few configuration changes. My next post will cover the Multi Node Cluster Hadoop Installation.

The Hadoop installation requires the pre-require steps like installing Java & ssh, creating dedicated Linux user account, disabling IPv6, generating key-gen for user account. This pre-require steps requires for Hadoop node(s) communication by Secure Shell Protocol(ssh). Disable IPv6 requires since Hadoop doesn't support for IPv6.

Step 1: Install Oracle Java JDK7 on Ubuntu.

Step 2: Install SSH-Server
  • To install Open ssh server, execute the following command in terminal
    sudo apt-get install openssh-server
Step 3: Create "hduser" user account under "hadoop" user group.
  • Execute the following command to create user group and user account. Give all required information while creating user account and give password as "hduser".
    sudo addgroup hadoop
    sudo adduser --ingroup hadoop hduser
    sudo adduser hduser sudo
Step 4: Disable IPv6
  • Open the "/etc/sysctl.conf" file with gedit as sudo user.
    sudo gedit /etc/sysctl.conf
  • Configure to disable the IPv6 by modifying/adding the following lines at last.
    #disable ipv6
    net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
    net.ipv6.conf.lo.disable_ipv6 = 1
  • Save & close the file then reboot the machine.
    sudo shutdown -r now
  • Check the configuration
    sudo sysctl -p
Step 5: Generate key-gen for hduser.
  • Execute the following command and give no password or key while generating key-gen
  • su - hduser
    ssh-keygen -t rsa -P ""
    cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • Now check the ssh for localhost
    su - hduser
    ssh localhost
  • If the above ssh is not working properly, we need to check and redo our Step 4 and Step 5 properly again.

Step 6: Download, Extract and Move Hadoop to hduser Home.
  • Execute the following commands on terminal to download Hadoop and extract to hduser home directory
    su - hduser
    cd /home/hduser/
    wget http://apache.osuosl.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1.tar.gz
    tar -zxvf hadoop-2.4.1.tar.gz
    mv hadoop-2.4.1 hadoop
Step 7: Configure Environment Variable.
  • Open  "$HOME/.bashrc" file with gedit as hduser.
    gedit $HOME/.bashrc
  • Add/Modify the following Environment Variables.
    export HADOOP_PREFIX=/home/hduser/hadoop
    export JAVA_HOME=/usr/java/jdk1.7.0_51
    export PATH=$PATH:$HADOOP_PREFIX/bin:$JAVA_HOME/bin
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_PREFIX/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
    
  • Save & Close the file then Execute the bash file:
    exec bash
  • Check the Path environment variable:
    $PATH
Step 8: Configure Hadoop Environment variable
  • Open "/home/hduser/hadoop/etc/hadoop/hadoop-env.sh" file with gedit as hduser.
    gedit /home/hduser/hadoop/etc/hadoop/hadoop-env.sh
  • Add/Modify the following Environment Variable Path
    export JAVA_HOME=/usr/java/jdk1.7.0_51
  • Save & Close the file
Step 9: Configure "core-site.xml" file for working temporary directory and name for File System.
  • Create a temp directory at hduser home
    mkdir /home/hduser/tmp
  • Open "/home/hduser/hadoop/etc/hadoop/core-site.xml" flle with gedit as hduser
    gedit /home/hduser/hadoop/etc/hadoop/core-site.xml
  • Add the following configurations in "core-site.xml" file. Then Save & Close the file
    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/hduser/tmp</value>
            <description>Base temporary directories</description>
        </property>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://localhost:54310</value>
            <description>Default file system name</description>
        </property>
    </configuration>
    
Step 10: Configure "hdfs-site.xml" file for data directory and replication.
  • Create a data direcotry directory
    mkdir /home/hduser/tmp/dfs/data
  • Open "/home/hduser/hadoop/etc/hadoop/hdfs-site.xml" flle with gedit as hduser
    gedit /home/hduser/hadoop/etc/hadoop/hdfs-site.xml
  • Add the following configurations in "hdfs-site.xml" file. Then Save & Close the file
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.data.dir</name>
            <value>/home/hduser/tmp/dfs/data</value>
        </property>
    </configuration>
Step 11: Configure "mapred-site.xml" file for Host and Port number of Job Tracker.
  • Open "/home/hduser/hadoop/etc/hadoop/mapred-site.xml" file with gedit as hduser
    gedit /home/hduser/hadoop/etc/hadoop/mapred-site.xml
  • Add the following configurations in "mapred-site.xml" file. Then Save & Close the file
    <configuration>
     <property>
            <name>mapred.job.tracker</name>
            <value>localhost:54311</value>
            <description>Host,port for MapReduce Job Tracker</description>
        </property>
    </configuration>
    
Step 12: Configure "yarn-site.xml" file for node manager and resource manager configurations.(Configuring yarn is optional)
  • Open "/home/hduser/hadoop/etc/hadoop/yarn-site.xml" file with gedit as hduser
    gedit /home/hduser/hadoop/etc/hadoop/yarn-site.xml
  • Add the following configurations in "yarn-site.xml" file. Then Save & Close the file
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>localhost:8025</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>localhost:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>localhost:8050</value>
        </property>
    </configuration>
Step 13: Format namenode and start Hadoop dfs
  • Execute the following command to format namenode and to start HDFS
    /home/hduser/hadoop/bin/hdfs namenode -format
    /home/hduser/hadoop/sbin/start-dfs.sh
  • List Hadoop Nodes
    jps
    It will list all tasks running like NameNode, DataNode, SecondaryNameNode, JobTracker, TaskTracker
The Hadoop Installation is completed, Now we can use Hadoop.

Comments

Popular posts from this blog

Install Hadoop 2.5.1 on Windows 7 - 64Bit Operating System

This post is about installing Single Node Cluster Hadoop 2.5.1 (latest stable version) on Windows 7 Operating Systems.Hadoop was primarily designed for Linux platform. Hadoop supports for windows from its version 2.2, but we need prepare our platform binaries. Hadoop official website recommend Windows developers to use this build for development environment and not on production, since it is not completely tested success on Windows platform. This post describes the procedure for generating the Hadoop build for Windows platform. Generating Hadoop Build For Windows Platform Step 1:Install Microsoft Windows SDK 7.1 In my case, I have used Windows 7 64 bit Operating System. Download Microsoft Windows SDK 7.1 from Microsoft Official website and install it. While installing Windows SDK,I have faced problem like C++ 2010 Redistribution is already installed. This problem will happen only if we have installed C++ 2010 Redistribution of higher version compared to the Windows SDK. ...

Install Spring Tool Suite on Ubuntu

This post is about installing Spring Tool Suite (STS) on Ubuntu. The Spring Tool Suite is an Eclipse-based development environment that is customized for developing Spring applications. Step 1: Download the latest Spring Tool Suite for Linux from STS official website: http://spring.io/tools/sts/all  Step 2: Extract into any folder which you prefer. My extracted Spring Tool Suite locations is /home/harishshan/springsource Step 3: Create the Menu icon for quick access sudo nano /usr/share/applications/STS.desktop Step 4: Enter the following content [Desktop Entry] Name=SpringSource Tool Suite Comment=SpringSource Tool Suite Exec=/home/harishshan/springsource/sts-3.4.0-RELEASE/STS Icon=/home/harishshan/springsource/sts-3.4.0-RELEASE/icon.xpm StartupNotify=true Terminal=false Type=Application Categories=Development;IDE;Java; Step 5: Now you can check from Quick Menu by typing " Spring "

How to fix Kindle wrong time left in chapter book

The most of kindle reader use the time left in chapter or time left in book options provided at left bottom of the page to estimate completing the book or the current chapter. But sometimes it was not  accurate and displaying 5 min left in chapter instead of 30 min left in chapter. To fix this issue just type ;ReadingTimeReset at search field and enter. Kindly refer the photo for the same.