Overview
Become an expert Hadoop Administrator by getting your hands-on Hadoop Clusters, including monitoring the Hadoop Distributed File System and Planning & Deployment. The course will also take a hands-on approach to the Hadoop Ecosystem, which consists of YARN, Map Reduce, HDFS, Cloudera Manager, Hadoop Cluster with Hive, HBase, Pig, Flume, and RDBMS using Sqoop.
Become a Hadoop Administrator by mastering Hadoop Clusters! Cognixia’s Big Data Hadoop Administrator course is specifically designed to provide a hands-on experience to install, configure, and manage the Apache Hadoop platform.
Curriculum
- a.Understanding Big Data Basics b. Big Data Use Cases c. Introduction to Hadoop d. Understanding Hadoop Ecosystem e. Introduction to HDFS
- a. Introduction to Namenode b. Introduction to Datanode
- a. Introduction to Secondary Namenode
- a. Introduction to MapReduce
- a. Introduction to JobTracker b. Introduction to TaskTracker
- a. Summarizing Hadoop Architecture b. Roles and Responsibilities of a Hadoop Administrator
- Linux internals
- i. Commands that are required ii. Linux basics
- Hadoop Cluster Installation Pre-requisites
- Pre-requisites of Hadoop Installation
- i. Software Downloads ii. Preparing your VM iii. Enabling VM with VMware iv. Understanding mandatory changes in the operating system
- Installation and Configuration
- i. Understanding Hadoop cluster installation modes ii. Understanding Hadoop Version 1 installation and configuration iii. Password-less SSH setup
- Hands-On Practice for creating a Hadoop cluster
- Helping individually in practicing Hadoop cluster installation
- By the end of the module, the student will be able to understand how to plan a production cluster of Hadoop. Students will understand the hardware and software requirements of a Hadoop cluster, performance tuning after cluster creation, and benchmarking.
- i. Hadoop 2.0 new features ii. YARN
- i. Understanding Resource Manager ii. Understanding Application Master iii. Understanding Node Manager iv. Understanding Hadoop 2 Job Execution Framework
- Hadoop 2 Multi-node cluster creation
- i. Pre-requisites of Hadoop Installation ii. Software Downloads iii. Preparing your VM iv. Enabling VM with VMware v. Understanding mandatory changes in the operating system vi. Installation and Configuration vii. Understanding Hadoop version 2 installation and configuration viii. Passwordless SSH setup
- Practice Hadoop 2 Multi-node Cluster Creation
- Helping individuals in practicing Hadoop 2 cluster installation
- a. Sample Yarn Job execution c. Understanding Issues of Hadoop 1 d. Understanding improvements in Hadoop 2 e. Namenode Federation
- Enable segregation of HDFS using multiple Namenodes
- Namenode – High Availability
- i. Achieving Namenode High-Availability using Quorum Journal Manager ii. Achieving Namenode High-Availability using Network File System
- Implementation of NN High Availability
- Helping individuals achieving Namenode High Availability
- Hadoop Ecosystem Introduction
- Understanding the integration of Hadoop ecosystem
- Touchbase with Hive
- What is Hive? ii. Architecture of Hive iii. Understanding Hive meta-store concepts
- HBase
- Understading HBase Basics ii. Understanding HBase storage Model iii. Understanding HBase Architecture iv. Cluster Installation and Configuration
- Pig
- What is Pig? ii. How Pig integrates with Hadoop cluster? iii. Demo of Pig Jobs using MapReduce
- Sqoop
- What is Sqoop? ii. How to import and export the data from Sqoop to RDBMS? iii. Example of Sqoop jobs using MySQL
- Flume
- What is F? ii. Sample Flume jobs
- Understanding the internals of Cloudera Manager a. Understanding the automation of Hadoop installation using Cloudera Manager b. Understanding Cloudera Hadoop Distribution and Cloudera Manager c. Understanding the underlying directory structure of Cloudera Hadoop d. Cloudera Hadoop Cluster Installation – CDH