INSTALL SINGLE NODE HADOOP CENTOS IN 10 STEPS

1- Downlaod Latest version of VMWare Player

https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/7_0

2- Download Latest version of CentOS

http://www.centos.org/download/

3- Install CentOS on VMWare

 Fullname: Mehmet Sen username: hduser password: hduser
 Virtual Machine Name: CentOS 64-bit
 Location: C:\Users\senmehmet\Documents\Virtual Machines\CentOS 64-bit
 Maximum disk size(GB): 20.0
 Split virtual disk into multiple files
 2 GB Ram

4- Open a new Terminal

5- In case you are root user in Terminal, then add a user (we already have hduser, so we don’t need this step at all)

# adduser huser
# groupadd hgroup
# usermod -g hgroup huser
#id huser
uid=1001(huser) gid=1002(hgroup) groups=1002(hgroup)
#su – huser
#su – root (to go back to the root)
Password: hduser

6-  Change to user hduser

#su – hduser (now let’s go to our real user)

7- Let’s generate ssh

[hduser@localhost ~]# ssh-keygen -t rsa
Created directory ‘/home/hduser/.ssh’
Press enter key to pass all passphrase
Your identification has been saved in /home/hduser/.ssh/id_rsa
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub
[hduser@localhost ~]# cd .ssh/
[hduser@localhost .ssh]# cp id_rsa.pub authorized_keys
[hduser@localhost .ssh]# ls -l

you will see authorized_keys

 id_rsa
 id_rsa.pub

[hduser@localhost .ssh]# chmod 600 authorized_keys
[hduser@localhost .ssh]# ssh localhost
ssh is strict, make sure it works. If you see the
The authenticity of host ‘localhost (::1)’ can’t be established
Type ‘yes’ to continue
It will add localhost anyway by logging in without passwords:
Warning: Permanently added ‘localhost’ (ECDSA)

8- Let’s install java, open another new terminal and go to root

[hduser@localhost ~]# su – root
Password hduser
[root@localhost ~]# cd /opt/

Download jdk-8u31-linux-x64.tar.gz’

[root@localhost ~]# wget –no-cookies –no-check-certificate –header “Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie” “http://download.oracle.com/otn-pub/java/jdk/8u31-b13/jdk-8u31-linux-x64.tar.gz”

Now extract it

[root@localhost ~]# tar xzf jdk-8u31-linux-x64.tar.gz

It will create the directory jdk1.8.0_31
However, if it stills points to the old one, try this and see

[root@localhost ~]# cd /etc/alternatives/
[root@localhost alternatives]# ls -l java
java -> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51-2.4.5.5.el7.x86_64/jre/bin/java

Now let’s point the new java jdk 1.8 to the default

[root@localhost alternatives]# alternatives –install /usr/bin/java java /opt/jdk1.8.0_31/bin/java 2
[root@localhost alternatives]# alternatives –config java

You will see a list of jdks, select the latest java number (I selected number 2 which is the /opt/jdk1.8.0_31/bin/java)

At this point JAVA 8 has been successfully installed on your system. We also recommend to setup javac and jar commands path using alternatives

[root@localhost alternatives]# alternatives –install /usr/bin/jar jar /opt/jdk1.8.0_31/bin/jar 2
[root@localhost alternatives]# alternatives –install /usr/bin/javac javac /opt/jdk1.8.0_31/bin/javac 2
[root@localhost alternatives]# alternatives –set jar /opt/jdk1.8.0_31/bin/jar
[root@localhost alternatives]# alternatives –set javac /opt/jdk1.8.0_31/bin/javac
[root@localhost alternatives]# update-alternatives –install /usr/bin/jps jps /opt/jdk1.8.0_31/bin/jps 1

Now check out your java version

[root@localhost alternatives]# java -version
java version “1.8.0_31”
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

Setup JAVA_HOME Variable

[root@localhost alternatives]# export JAVA_HOME=/opt/jdk1.8.0_31
Setup JRE_HOME Variable
[root@localhost alternatives]# export JRE_HOME=/opt/jdk1.8.0_31/jre
Setup PATH Variable
[root@localhost alternatives]# export PATH=$PATH:/opt/jdk1.8.0_31/bin:/opt/jdk1.8.0_31/jre/bin

 

9-Let’s download and setup Hadoop

Reference:  http://tecadmin.net/setup-hadoop-2-4-single-node-cluster-on-linux/ and https://www.youtube.com/watch?v=VoDEIyXtO5U

First check out the latest version of hadoop:

http://hadoop.apache.org/releases.html (in this case it is 2.6)

let’s download hadoop under opt

[root@localhost alternatives]# cd /opt/
[root@localhost opt]# wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
[root@localhost opt]# tar -zxvf hadoop-2.6.0.tar.gz
[root@localhost opt]# chown -R hduser:hduser /opt/hadoop-2.6.0

Now go back to your hduser terminal

[hduser@localhost opt]# vi ~/.bashrc

First we need to set environment variable uses by hadoop. Edit ~/.bashrc file and append following values at end of file.

 export HADOOP_HOME=/opt/hadoop-2.6.0
 export HADOOP_INSTALL=$HADOOP_HOME
 export HADOOP_MAPRED_HOME=$HADOOP_HOME
 export HADOOP_COMMON_HOME=$HADOOP_HOME
 export HADOOP_HDFS_HOME=$HADOOP_HOME
 export YARN_HOME=$HADOOP_HOME
 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
 export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

save and exit;:wq!
Now apply the changes in current running environment

[hduser@localhost opt]# source ~/.bashrc
[hduser@localhost opt]# cd $HADOOP_HOME/etc/hadoop
[hduser@localhost hadoop]# vi hadoop-env.sh
[hduser@localhost hadoop]# export JAVA_HOME=/opt/jdk1.8.0_31/

Now let’s edit other Hadoop’s config settings:
Go to hduser root

[hduser@localhost hadoop]# cd ~/
[hduser@localhost ~]# mkdir -p hadoopdata/hdfs/namenode
[hduser@localhost ~]# mkdir -p hadoopdata/hdfs/datanode

Now go back again to the hadoop config directory

[hduser@localhost ~]# cd $HADOOP_HOME/etc/hadoop
[hduser@localhost hadoop]# cp mapred-site.xml.template mapred-site.xml
[hduser@localhost hadoop]# vi mapred-site.xml

type

 mapreduce.framework.name
 yarn

[hduser@localhost hadoop]# vi yarn-site.xml

type

 yarn.nodemanager.aux-services
 mapreduce_shuffle

[hduser@localhost hadoop]# vi core-site.xml

type

 fs.default.name
 hdfs://localhost:9000

[hduser@localhost hadoop]# vi hdfs-site.xml

type

 dfs.replication
 1
 dfs.name.dir
 file:///home/hduser/hadoopdata/hdfs/namenode
 dfs.data.dir
 file:///home/hduser/hadoopdata/hdfs/datanode

Let’s format the namenode using following command, make sure that Storage directory is

[hduser@localhost hadoop]# cd /opt/
[hduser@localhost opt]# hdfs namenode -format

you will receive Storage directory /home/hduser/hadoopdata/hdfs/namenode has been successfully formatted

Let’s start the hadoop cluster

[hduser@localhost opt]# start-dfs.sh

if not seen default then run this script from $HADOOP_HOME/sbin/start-dfs.sh
Don’t worry about the output of “can’t be established” or “Unable to load native library..” warnings

Let’ start resource allocation manager

[hduser@localhost opt]# start-yarn.sh

if not seen default then run this script from $HADOOP_HOME/sbin/start-yarn.sh

Let’s list the java processes

[hduser@localhost opt]# jps

You will see a similar screen like this:

 82289 DataNode
 83712 Jps
 82532 SecondaryNameNode
 82758 ResourceManager
 82857 NodeManager
 82172 NameNode

For a more detailed java processes, you can also run

[hduser@localhost opt]# jps -lm

 

10- Accessing Hadoop Services in Browser

 Access port 50070 for Hadoop NameNode Information: http://localhost:50070
 Access port 8088 for clustor and all applications: http://localhost:8088
 Access port 50090 for secondary namenode details: http://locahost:50090
 Access port 50075 for DataNode details: http://localhost:50075

Make the HDFS directories required using following commands

[hduser@localhost opt]# cd /hadoop-2.6.0
[hduser@localhost hadoop-2.6.0]# bin/hdfs dfs -mkdir /user
[hduser@localhost hadoop-2.6.0]# bin/hdfs dfs -mkdir /user/hduser

Let’s see if created

[hduser@localhost hadoop-2.6.0]# hdfs dfs -ls /
drwxr-xr-x – hduser supergroup 0 2015-02-09 16:57 /user
[hduser@localhost hadoop-2.6.0]# hdfs dfs -ls /user
drwxr-xr-x – hduser supergroup 0 2015-02-09 16:57 /user/hduser
[hduser@localhost hadoop-2.6.0]# hdfs dfs -ls /user/hduser
drwxr-xr-x – hduser supergroup 0 2015-02-09 16:57 /user/hduser/logs

Make sure first you install first apache from root terminal (httdp)

[root@localhost etc]# yum -y install httpd
[root@localhost etc]# systemctl enable httpd.service
[root@localhost etc]# systemctl status httpd.service
[root@localhost etc]# systemctl start httpd.service

Now make sure that httpd log files have the hduser owned

[root@localhost etc]# chown -R hduser:hduser /var/log/httpd/

Let’s go back to the hduser terminal
Let’s copy all files from local file system /var/log/httpd to hadoop distributed file system

[hduser@localhost hadoop-2.6.0]# bin/hdfs dfs -put /var/log/httpd logs

Now view the files from: http://localhost:50070/explorer.html#user/hduser/logs/httpd
Finally, copy the logs directory for hadoop distributed file system to the local system

[hduser@localhost hadoop-2.6.0]# hdfs dfs -get logs /tmp/logs
[hduser@localhost hadoop-2.6.0]# ls -l /tmp/logs/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s