Elite

HADOOP

About Course

  • Do you need to understand big data and how it will impact your business? This Specialization is for you. You will gain an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers. Programming language is not required. You will be guided through the basics of using Hadoop with MapReduce, Spark, Pig and Hive.
  • You will experience how one can perform predictive modeling and leverage graph analytics to model problems.
  • This specialization will prepare you to ask the right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex datasets.
  • To enhance your learning experience,  we will also make you work on real-time industry-based projects

Discription

What will I learn

Topics for this course

    1. Overview of Big Data Technologies
    2. Characteristics of BigData
    3. Why analyze BigData and Parallel Computing Important
    4. Big Data challenges & solutions
    5. Data Science vs Data Engineering
    6. Various products for Handling BigData
    1. Working with HDFS
    2. HDFS Architecture
    3. Understanding the problem statement and challenges persisting to such large data to perceive the need of Distributed File System.
    4. Understanding HDFS architecture to solve problems
    5. Understanding configuration and creating directory structure to get a solution of the given problem statement ∙
    6. Setup appropriate permissions to secure data for appropriate users
    7. Setting up Java Development with HDFS libraries to use HDFS Java APIs
    1. What is Map Reduce
    2. Input and output formats
    3. Data Types in Map Reduce
    4. Flow of Map Reduce Jobs
    5. Wordcount In Map Reduce
    6. How to use Custom Input Formats
    7. Use case for Structure Data Sets
    8. Writing Custom Classes
    1. What is HIVE
    2. Architecture of HIVE
    3. Tables in Hive with Load Functions
    4. Query Optimization
    5. Partitioning and Bucketing
    6. Joins in HIVE
    7. Indexing In HIVE
    8. File Formats in HIVE
    1. What is Sqoop
    2. Relation Between SQL and Hadoop
    3. Apache SQOOP
    4. Performing SQOOP import
    5. Incrementals and Conditional Imports
    6. Performing SQOOP Export
    1. What is PIG and ETL
    2. Introduction to PIG Architecture
    3. Introduction of PIG Latin
    4. How to Perform ETL
    5. Use cases of PIG
    6. Discuss Hive, Sqoop, PIG, HBASE, Flume
    1. What is HBASE
    2. Architecture of HBASE
    3. CRUD operations in HBASE
    4. Retrieval of HBASE Data.
    5. Introduction of Apache Oozie (Scheduler tool)
    1. Basic data types and literals used
    2. Basic data types and literals used
    3. Classes ,Traites, Control Structure of Scala
    4. Collection and Libraries of Scala
    1. Limitations of MapReduce in Hadoop Objectives
    2. Advanced Map Reduce
    3. Batch vs. Real-time analytics
    4. YARN
    5. Application of stream processing
    6. Spark vs. Hadoop Eco-system
    1. Features of RDDs
    2. How to create RDDs
    3. RDD operations and methods
    4. Explain RDD functions and describe how to write different codes in Scala
    1. Explain the importance and features of SparkQL
    2. Describe methods to convert RDDs to DataFrames
    3. Explain concepts of SparkSQL
    4. Describe the concept of hive integration
    1. Use cases and techniques of Machine Learning
    2. Spark Configuration, Cluster Modes
    3. Describe the key concepts of Spark ML
    4. Concept of an ML Dataset, and ML algorithm, model selection via cross validation

Why learn Hadoop