Overview
This three-day introductory course focusses on helping participants gain a thorough understanding of maintaining a Hadoop cluster and its components. Compared to other cluster architectures, Hadoop s designed for massive scalability and has superior fault tolerance. The course also covers how to install, configure and maintain Hadoop on Linux in various computing environments.
What You'll Learn
- Learn to install, configure and maintain the Apache Hadoop framework
 - Explore MapReduce, YARN and Spark
 - Explore Mahout and MLib as well as other frameworks
 - Explore Hadoop architecture (MapReduce, YARN, HDFS, Spark, Cassandra, HBase, Pig, Hive)
 - Install Hadoop
 - Test-run Hadoop programs (Explore basic tests)
 - Learn to optimize and performance-tune Hadoop
 - Explore installing Hadoop for the cloud and HBase (optional)
 
Curriculum
- Hadoop history and concepts
 - Ecosystem
 - Distributions
 - High level architecture
 - Hadoop myths
 - Hadoop challenges (hardware / software)
 
- Selecting software and Hadoop distributions
 - Sizing the cluster and planning for growth
 - Selecting hardware and network
 - Rack topology
 - Installation
 - Multi-tenancy
 - Directory structure and logs
 - Benchmarking
 
- Concepts (horizontal scaling, replication, data locality, rack awareness)
 - Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
 - Health monitoring
 - Command-line and browser-based administration
 - Adding storage and replacing defective drives
 
- Parallel computing before MapReduce: compare HPC versus Hadoop administration
 - MapReduce cluster loads
 - Nodes and Daemons (JobTracker, TaskTracker)
 - MapReduce UI walk through
 - MapReduce configuration
 - Job config
 - Job schedulers
 - Administrator view of MapReduce best practices
 - Optimizing MapReduce
 - Fool proofing MR: what to tell your programmers
 - YARN: architecture and use
 
- Hardware monitoring
 - System software monitoring
 - Hadoop cluster monitoring
 - Adding and removing servers and upgrading Hadoop
 - Backup, recovery, and business continuity planning
 - Cluster configuration tweaks
 - Hardware maintenance schedule
 - Oozie scheduling for administrators
 - Securing your cluster with Kerberos
 - The future of Hadoop
 
Who should attend
The course is highly recommended for –
- Hadoop administrators
 - Software administrators
 - System administrators
 
Prerequisites
Participants need to be familiar with navigating the Linux command-line and have a basic knowledge of Linux editor, such as, VI/nano, etc. for editing code.
            


