Posts

Showing posts from October, 2019

Apache Hadoop MapReduce

Image
Before drive into this tutorial, I suggest you to read what is Bigdata , Hadoop , HDFS and YARN . If you not aware of those topics. What is MapReduce MapReduce is an Apache framework used to process large amount of data in parallel which is presented on hadoop cluster. It's will do the job as divide and conquer manner. There are two key components in Hadoop MapReduce. Component Mapper and Reducer Mapper It takes input from input split and process the each input split. The result of processed input split will be collection of key, value pairs. The result will be persist on the local disk. The number of mapper will be desired based on input split. How many number of inputsplit is there, those many number of mapper will be run. Reducer It takes Mapper output as a input and process that intermediate result (collection of key, value pairs) and combine those key, value pairs and create a smaller set of collection (Key, value pairs). The final output wi