Hadoop is an open source distributed framework implemented on commodity hardware. Three main components of Hadoop are: HDFS, MapReduce and YARN. Hadoop Distributed File system(HDFS) handles storage in Hadoop framework. YARN provides a platform for generic and efficient processing of Big Data in Ha
Execution Modes of HadoopEach component in Hadoop is configured using XML files that are present in Hadoop’s configuration directory. Common properties of Hadoop will be updated in core-site.xml and HDFS, MapReduce and YARN properties will be updated in hdfs-site.xml, mapred-site.xml and yarn-site.x
Over the years, Hadoop has come a long way. Several applications were developed to be run on top of it, to provide particular services to Hadoop users. Some of the them are given below:Database Storage:HBase: It is a type of columnar store that allows random access to structured data in HDFS.Data Pr
Pre-RequisiteConcepts you must know before doing this course Big Data Technology Landscape course Basic understanding of UNIX / Linux commands Basic understanding of RDBMS systems Basic knowledge of distributed computing systemRecommended resources to learn pre-requisite concepts Big Data Technology
SummaryIn this course, we have learnt Fundamentals of big data, types of digital data Sources of Big data and the challenges faced by RDBMS systems. How is Big data handled - Big data technologies, NoSQL databases and Hadoop Where is Big data used - Big Data Analytics using Hadoop and its ecosystem
Machine Learning and Big DataCommon algorithms in Machine LearningWhat is Associative Rule Mining?What is Collaborative Filtering?What is Regression Analysis?What is Clustering?Big Data Project: Five Stage Execution