This post is about Hadoop 2.4.1 (latest stable version) Single Node Cluster Installation on Ubuntu 14.04. The Hadoop Single Node Cluster Installation method are suitable only for beginners practicing. The Hadoop Multi Node Cluster is more suitable for production environment Since Hadoop is meant for distributed environment. The Multi Node Cluster Hadoop Installation steps are almost as same as Single Node Cluster Hadoop Installation with few configuration changes. My next post will cover the Multi Node Cluster Hadoop Installation.
The Hadoop installation requires the pre-require steps like installing Java & ssh, creating dedicated Linux user account, disabling IPv6, generating key-gen for user account. This pre-require steps requires for Hadoop node(s) communication by Secure Shell Protocol(ssh). Disable IPv6 requires since Hadoop doesn't support for IPv6.
Step 1: Install Oracle Java JDK7 on Ubuntu.
Step 2: Install SSH-Server
- To install Open ssh server, execute the following command in terminal
sudo apt-get install openssh-server
- Execute the following command to create user group and user account. Give all required information while creating user account and give password as "hduser".
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo
- Open the "/etc/sysctl.conf" file with gedit as sudo user.
sudo gedit /etc/sysctl.conf
- Configure to disable the IPv6 by modifying/adding the following lines at last.
#disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
- Save & close the file then reboot the machine.
sudo shutdown -r now
- Check the configuration
sudo sysctl -p
- Execute the following command and give no password or key while generating key-gen
su - hduser ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
su - hduser ssh localhost
Step 6: Download, Extract and Move Hadoop to hduser Home.
- Execute the following commands on terminal to download Hadoop and extract to hduser home directory
su - hduser cd /home/hduser/ wget http://apache.osuosl.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1.tar.gz tar -zxvf hadoop-2.4.1.tar.gz mv hadoop-2.4.1 hadoop
- Open "$HOME/.bashrc" file with gedit as hduser.
gedit $HOME/.bashrc
- Add/Modify the following Environment Variables.
export HADOOP_PREFIX=/home/hduser/hadoop export JAVA_HOME=/usr/java/jdk1.7.0_51 export PATH=$PATH:$HADOOP_PREFIX/bin:$JAVA_HOME/bin export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_PREFIX/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
- Save & Close the file then Execute the bash file:
exec bash
- Check the Path environment variable:
$PATH
- Open "/home/hduser/hadoop/etc/hadoop/hadoop-env.sh" file with gedit as hduser.
gedit /home/hduser/hadoop/etc/hadoop/hadoop-env.sh
- Add/Modify the following Environment Variable Path
export JAVA_HOME=/usr/java/jdk1.7.0_51
- Save & Close the file
- Create a temp directory at hduser home
mkdir /home/hduser/tmp
- Open "/home/hduser/hadoop/etc/hadoop/core-site.xml" flle with gedit as hduser
gedit /home/hduser/hadoop/etc/hadoop/core-site.xml
- Add the following configurations in "core-site.xml" file. Then Save & Close the file
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hduser/tmp</value> <description>Base temporary directories</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>Default file system name</description> </property> </configuration>
- Create a data direcotry directory
mkdir /home/hduser/tmp/dfs/data
- Open "/home/hduser/hadoop/etc/hadoop/hdfs-site.xml" flle with gedit as hduser
gedit /home/hduser/hadoop/etc/hadoop/hdfs-site.xml
- Add the following configurations in "hdfs-site.xml" file. Then Save & Close the file
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hduser/tmp/dfs/data</value> </property> </configuration>
- Open "/home/hduser/hadoop/etc/hadoop/mapred-site.xml" file with gedit as hduser
gedit /home/hduser/hadoop/etc/hadoop/mapred-site.xml
- Add the following configurations in "mapred-site.xml" file. Then Save & Close the file
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>Host,port for MapReduce Job Tracker</description> </property> </configuration>
- Open "/home/hduser/hadoop/etc/hadoop/yarn-site.xml" file with gedit as hduser
gedit /home/hduser/hadoop/etc/hadoop/yarn-site.xml
- Add the following configurations in "yarn-site.xml" file. Then Save & Close the file
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>localhost:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:8050</value> </property> </configuration>
- Execute the following command to format namenode and to start HDFS
/home/hduser/hadoop/bin/hdfs namenode -format /home/hduser/hadoop/sbin/start-dfs.sh
- List Hadoop Nodes
jps
It will list all tasks running like NameNode, DataNode, SecondaryNameNode, JobTracker, TaskTracker
Comments
Post a Comment