Posts

Showing posts from July, 2018

What is Hadoop

Image
Before entering into Hadoop, we have to understand the issues related to Big Data and traditional processing system. In the previous blog, we already discussed about Big Data in detail. Hadoop is an open-source batch processing framework developed in Java, used to store and analyze the large sets of data. It is being used by Google, Yahoo, Facebook, Twitter, LinkedIn and etc. Components of Hadoop   HDFS - Hadoop Distributed File System, Used to store huge amount of dataset across the cluster.  YARN - Yet Another Resource Negotiator, Used for managing the cluster.  MapReduce - It's a software framework to process huge data in parallel on a cluster. Features of Hadoop    Cluster storage - Hadoop will split the single data set into multiple and stored across cluster (more than one storage system working together) with replication (default 3). It's used to increase performance and reliability.   Distributed computing - A single problem divides into multiple sub