What is Hadoop
Before entering into Hadoop, we have to understand the issues related to Big Data and traditional processing system. In the previous blog, we already discussed about Big Data in detail. Hadoop is an open-source batch processing framework developed in Java, used to store and analyze the large sets of data. It is being used by Google, Yahoo, Facebook, Twitter, LinkedIn and etc.
Components of Hadoop
- HDFS - Hadoop Distributed File System, Used to store huge amount of dataset across the cluster.
- YARN - Yet Another Resource Negotiator, Used for managing the cluster.
- MapReduce - It's a software framework to process huge data in parallel on a cluster.
Features of Hadoop
- Cluster storage - Hadoop will split the single data set into multiple and stored across cluster (more than one storage system working together) with replication (default 3). It's used to increase performance and reliability.
- Distributed computing - A single problem divides into multiple sub-problem and each sub-problem solve by different computer. Each computer will communicate internally to avoid duplication. If all sub-problem done the solution will come as single.
- Commodity hardware - Commodity hardware is nothing but cheap hardware, which is inexpensive and more availability. We can use our traditional systems on the cluster. No need to buy expensive system.
- Parallel processing - A single task will split into many and every task will run separate CPU to reduce program running time.
- Low latency - Process huge data sets with less time.
- Data availability - Data will store across cluster with replication. If any machine on the cluster will down the specific machine data will available on another machine in the cluster.
- Fault-tolerance - The system will work as usual without any data loss, even if some system has failed. This is one of the main advantages of Hadoop.
- Horizontal scalability - We can add or remove server (system) on the cluster without interrupting existing servers.
Hadoop Daemons
Daemons
- NameNode - It is the master node used to store the meta information of all the files, such as no of Blocks, location, replicas and etc.. It will manage the Slave node.
- Secondary NameNode - It's called as Checkpoint Node. It is backup node for NameNode. It gee FsImage and EditLogs from the NameNode and merges EditLogs with the Fsimage regularly.
- DataNode - It is the slave node used to store the actual data in HDFS. It's used to perform read and write operation as per request.
- ResourceManager - It's runs on the master node. This is one for each cluster.
- NodeManager - It's runs on the each slave node.
Awesome informations that you have shared for us.I eagerly waiting for more updates in future.
ReplyDeleteHadoop Training in Chennai
Big data training in chennai
Hadoop Training in Anna Nagar
JAVA Training in Chennai
Python Training in Chennai
Selenium Training in Chennai
hadoop training in Annanagar
big data training in chennai anna nagar
Big data training in annanagar
Thanks for the blog.Big Data Hadoop Online Training
ReplyDelete