Posts

HDFS Commands Part - I

Image
Prequirement Before start Hadoop shell have to install Hadoop . File System Shell Most of the commands in FS shell is like corresponding Linux commands. The FileSystem (FS) shell is invoked by bin/hadoop fs <args> . All the FS shell commands take path URIs as arguments. For HDFS the scheme is hdfs , and for the local filesystem the scheme is file . The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://<namenodehost>/dir_1/dir_2 or simply as /dir_1/dir_2 ( given that your configuration is set to point to hdfs://<namenodehost> ). Error information is sent to stderr and the output is sent to stdout . Basic Commands 1. version   This HDFS command prints the Hadoop version.        Example: hdfs dfs version 2. cat  This HDFS command used to displays the conten...

Install Hadoop On Ubuntu

Image
Prequirement Before installing Hadoop, you have to install Java . Hadoop Installation Steps Step 1: Create Separate Login          $ sudo addgroup hadoop          $ sudo adduser –ingroup hadoop hdfs user          $ sudo adduser hdfsuser sudo   Step 2: Install SSH          $ sudo apt-get update          $ sudo apt-get install ssh          $ sudo su hdfsuser          $ sudo ssh-keygen -t rsa -p ""                >> If it's asking for file name or location, leave it blank.                $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys     ...

What is Hadoop

Image
Before entering into Hadoop, we have to understand the issues related to Big Data and traditional processing system. In the previous blog, we already discussed about Big Data in detail. Hadoop is an open-source batch processing framework developed in Java, used to store and analyze the large sets of data. It is being used by Google, Yahoo, Facebook, Twitter, LinkedIn and etc. Components of Hadoop   HDFS - Hadoop Distributed File System, Used to store huge amount of dataset across the cluster.  YARN - Yet Another Resource Negotiator, Used for managing the cluster.  MapReduce - It's a software framework to process huge data in parallel on a cluster. Features of Hadoop    Cluster storage - Hadoop will split the single data set into multiple and stored across cluster (more than one storage system working together) with replication (default 3). It's used to increase performance and reliability.   Distributed computing - A single problem divi...

How to Install Java on Ubuntu

Image
Why Do I need to install ?   Java is a open source and platform independent programming language. First released by Sun Microsystems in 1995. Java used to develop programs that run on Windows, Linux and Mac computers. There are lots of software will not work unless Java installed. What is JDK? Java Development Kit(JDK) is used to develop java application. which is includes JRE, JVM, interpreter and compiler. What is JRE? Java Runtime Environment(JRE) is a set of java APIs used to develop java applications. What is JVM? Java Virtual Machine(JVM) is use to execute the java applications. JDK will generate byte-code(0's,1's) from our source code. JVM will execute the byte-code and show output. Install Java 8 Step 1: Add Java Repository (PPA)   sudo add-apt-repository ppa:webupd8team/java Java Repository Step 2: Update your Package sudo apt-get update Update Package Step 3: Install Java sudo apt-get install oracle-java8-insta...

Big Data

Image
  What is Data? Data is an unprocessed or raw format information. It can be any character,  text, numbers, images, audio, or video. What is Big Data? Big data is a term that describes the large amount of data. It’s used to process huge and complex data. Data that is structured, unstructured, semi-structured and very large cannot be processed by relational database engines on given time . Moreover, The data will be growing exponentially based on time. This type of data called "big data" . Categories Of Big Data Structured Structured data refers to any data that fixed fields and records for example RDBMS and csv files. Example of Structured data Unstructured Unstructured data refers to any data that does not have predefined format, for example machine generated logs and web page.   Example of Unstructured data Semi-Structured Semi-Structured data refers to any data that would be a raw data or typed data in a ...