Posts

Showing posts with the label big data

What is Apache Hive

Image
What is Apache Hive? Hadoop is like sea with a lot of tools and technologies that are making our job done. The Hive is one of those technology. Actually hive running on top of the Hadoop. Apache Hive is a Hadoop component that is basically developed for data analysts. Even though Apache Pig can also be developed for the same purpose, Hive is used more by researchers and programmers. It is an open-source data warehousing system, which is exclusively used to query and analyze huge volume of datasets stored in Hadoop HDFS . Hive supports for data query, data summarization and data analysis. HiveQL is the query language in Hive. This language translates SQL-like queries into MapReduce jobs for deploying them on Hadoop. Hive providing shell where we can perform basic operation which is supported by Hive. If we run HiveQL in hive shell, it will call MapReduce job internally and get back the result. Hive has the schema flexibility and data serialisation and serialisation. Advantage of...

YARN - Yet Another Resource Negotiator

Image
I hope you know what is BigData , Hadoop and HDFS If not, I suggest you to read above topics before read this. What is YARN? YARN stands for Yet Another Resource Negotiator. It's one of the Hadoop core components. YARN is use to manage the hadoop cluster. like schedule task and manage the resource. In Hadoop V1, MapReduce is the one who handled all resource related details and task/job details. It's over load for MapRedice job. So, in Hadoop V2 they splitted resource related things separately and name as YARN. Components: Resource Manager Node Manager Resource Manager It's master node in YARN. Only per cluster. It knows the slave node details. It inhabit the JobTracker of MapReduce Version 1 (MRV1). Resource Scheduler Resource Scheduler is responsible for allocating resource to application and it's not perform any monitoring and tracking activities like application failure, Hardware failure and so on. App Manager It maintain the...

Big Data

Image
  What is Data? Data is an unprocessed or raw format information. It can be any character,  text, numbers, images, audio, or video. What is Big Data? Big data is a term that describes the large amount of data. It’s used to process huge and complex data. Data that is structured, unstructured, semi-structured and very large cannot be processed by relational database engines on given time . Moreover, The data will be growing exponentially based on time. This type of data called "big data" . Categories Of Big Data Structured Structured data refers to any data that fixed fields and records for example RDBMS and csv files. Example of Structured data Unstructured Unstructured data refers to any data that does not have predefined format, for example machine generated logs and web page.   Example of Unstructured data Semi-Structured Semi-Structured data refers to any data that would be a raw data or typed data in a ...