Install Hadoop On Ubuntu
Prequirement
Before installing Hadoop, you have to install Java.
Hadoop Installation Steps
Step 1: Create Separate Login
$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hdfsuser
$ sudo adduser hdfsuser sudo
Step 2: Install SSH
$ sudo apt-get update
$ sudo apt-get install ssh
$ sudo su hdfsuser
$ sudo ssh-keygen -t rsa -p ""
$ sudo ssh-keygen -t rsa -p ""
>> If it's asking for file name or location, leave it blank.
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ exit
Step 3: Install Hadoop on Ubuntu
$ wget http://www-us.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz
$ tar xvzf hadoop-3.0.0.tar.gz
$ sudo mkdir -p /usr/local/hadoop
$ cd hadoop-3.1.0/
$ sudo mv * /usr/local/hadoop
$ sudo chown -R hdfsuser:hadoop /usr/local/hadoop
Step 4: Hadoop on Configuration
We have to configure the following files.- ~/.bashrc
- hadoop-env.sh
- core-site.xml
- hdfs-site.xml
- yarn-site.xml
1. ~/.bashrc
$ sudo vi ~/.bashrc
Append the following variables in bashrc file.
# ************************************************ #
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
#HADOOP VARIABLES END
# ************************************************ #
$source ~/.bashrc
2. hadoop-env.sh
$sudo vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Now, set JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
3.core-site.xml
$ sudo mkdir -p /apps/hadoop/tmp
$ sudo chown hdfsuser:hadoop /app/hadoop/tmp
Open the core-site.xml file,
$sudo vi /usr/local/hadoop/etc/hadoop/core-site.xml
Append the following properties within configuration tags.
<property>
<name>hadoop.tmp.dir</name>
<value>/apps/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description></description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/apps/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<value>hdfs://localhost:54310</value>
<description></description>
</property>
4.hdfs-site.xml
Create two directory for name node and data node
$ sudo mkdir -p /usr/local/hadoop/namenode
$ sudo mkdir -p /usr/local/hadoop/datanode
$ sudo chown -R hduser:hadoop /usr/local/hadoop
$ sudo mkdir -p /usr/local/hadoop/namenode
$ sudo mkdir -p /usr/local/hadoop/datanode
$ sudo chown -R hduser:hadoop /usr/local/hadoop
Open hdfs-site.xmlfile, and append the following properties with in configuration tag
$sudo vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.</description>
</property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoo/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/datanode</value>
</property>
5.yarn-site.xml
Open yarn-site.xml file, and append following properties within configuration tag
$sudo vi /usr/local/hadoop/etc/hadoop/yarn-site.xm
<property><name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
Step 5: Format Hadoop file system
$ hadoop namenode -format
Step 6: Start Hadoop Daemons
$ cd /usr/local/hadoop/
$ ./sbin/start-all.sh
Check running daemons
$ jps
It should show following daemons
$ jps
It should show following daemons
SecondaryNameNode
ResourceManager
DataNode
NodeManager
NameNode
Jps
Step 7: Stop Hadoop daemons
$ cd /usr/local/hadoop/
$ ./sbin/stop-all.sh
Its an amazing blog.
ReplyDeleteBig Data Hadoop Online Training
Nice blog,keep updating more posts.
ReplyDeleteThank you....
hadoop admin certification