Install Hadoop On Ubuntu


Hadoop Installation

Prequirement

Before installing Hadoop, you have to install Java.

Hadoop Installation Steps


Step 1: Create Separate Login

         $ sudo addgroup hadoop
         $ sudo adduser –ingroup hadoop hdfsuser
         $ sudo adduser hdfsuser sudo 

Step 2: Install SSH

         $ sudo apt-get update
         $ sudo apt-get install ssh
         $ sudo su hdfsuser
         $ sudo ssh-keygen -t rsa -p ""
               >> If it's asking for file name or location, leave it blank.
              $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
         $ chmod 0600 ~/.ssh/authorized_keys
         $ exit 

Step 3: Install Hadoop on Ubuntu

         $ wget http://www-us.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz
         $ tar xvzf hadoop-3.0.0.tar.gz
         $ sudo mkdir -p /usr/local/hadoop
         $ cd hadoop-3.1.0/  
         $ sudo mv * /usr/local/hadoop
         $ sudo chown -R hdfsuser:hadoop /usr/local/hadoop  

Step 4: Hadoop on Configuration 

We have to configure the following files.
  1. ~/.bashrc
  2. hadoop-env.sh
  3. core-site.xml
  4. hdfs-site.xml
  5. yarn-site.xml

1. ~/.bashrc

       $ sudo vi ~/.bashrc
Append the following variables in bashrc file.
# ************************************************ #
#HADOOP VARIABLES START

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

#HADOOP VARIABLES END 

# ************************************************ #

        $source ~/.bashrc

2. hadoop-env.sh

        $sudo vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Now, set JAVA_HOME
        export JAVA_HOME=
/usr/lib/jvm/java-8-oracle

3.core-site.xml

         $ sudo mkdir -p /apps/hadoop/tmp

    $ sudo chown hdfsuser:hadoop /app/hadoop/tmp

Open the core-site.xml file,
         $sudo vi /usr/local/hadoop/etc/hadoop/core-site.xml

Append the following properties within configuration tags.

<property>
    <name>hadoop.tmp.dir</name>
    <value>/apps/hadoop/tmp</value>
    <description>A base for other temporary directories.</description>
</property>

<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
    <description></description>
</property>

4.hdfs-site.xml

Create two directory for name node and data node
         $ sudo mkdir -p /usr/local/hadoop/namenode
         $ sudo mkdir -p /usr/local/hadoop/datanode
         $ sudo chown -R hduser:hadoop /usr/local/hadoop

Open hdfs-site.xmlfile, and append the following properties with in configuration tag

        $sudo vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml

 <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.</description>
</property>

<property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoo/namenode</value>
</property>

<property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop/datanode</value>
</property>

5.yarn-site.xml

Open yarn-site.xml file, and append following properties within configuration tag

      $sudo vi /usr/local/hadoop/etc/hadoop/yarn-site.xm

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>


Step 5: Format Hadoop file system

             $ hadoop namenode -format 

Step 6: Start Hadoop Daemons

        $ cd /usr/local/hadoop/
        $ ./sbin/start-all.sh 
Check running daemons    
        $ jps
It should show following daemons
 
        SecondaryNameNode
        ResourceManager
        DataNode
        NodeManager
        NameNode
        Jps

Step 7: Stop Hadoop daemons

        $ cd /usr/local/hadoop/
        $ ./sbin/stop-all.sh




Comments

Post a Comment

Popular posts from this blog

HDFS Commands Part - II

HDFS Commands Part - I

Apache Hadoop MapReduce