Overview
This three-day introductory course focusses on helping participants gain a thorough understanding of maintaining a Hadoop cluster and its components. Compared to other cluster architectures, Hadoop s designed for massive scalability and has superior fault tolerance. The course also covers how to install, configure and maintain Hadoop on Linux in various computing environments.
What You'll Learn
- Learn to install, configure and maintain the Apache Hadoop framework
- Explore MapReduce, YARN and Spark
- Explore Mahout and MLib as well as other frameworks
- Explore Hadoop architecture (MapReduce, YARN, HDFS, Spark, Cassandra, HBase, Pig, Hive)
- Install Hadoop
- Test-run Hadoop programs (Explore basic tests)
- Learn to optimize and performance-tune Hadoop
- Explore installing Hadoop for the cloud and HBase (optional)
Curriculum
- Hadoop history and concepts
- Ecosystem
- Distributions
- High level architecture
- Hadoop myths
- Hadoop challenges (hardware / software)
- Selecting software and Hadoop distributions
- Sizing the cluster and planning for growth
- Selecting hardware and network
- Rack topology
- Installation
- Multi-tenancy
- Directory structure and logs
- Benchmarking
- Concepts (horizontal scaling, replication, data locality, rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring
- Command-line and browser-based administration
- Adding storage and replacing defective drives
- Parallel computing before MapReduce: compare HPC versus Hadoop administration
- MapReduce cluster loads
- Nodes and Daemons (JobTracker, TaskTracker)
- MapReduce UI walk through
- MapReduce configuration
- Job config
- Job schedulers
- Administrator view of MapReduce best practices
- Optimizing MapReduce
- Fool proofing MR: what to tell your programmers
- YARN: architecture and use
- Hardware monitoring
- System software monitoring
- Hadoop cluster monitoring
- Adding and removing servers and upgrading Hadoop
- Backup, recovery, and business continuity planning
- Cluster configuration tweaks
- Hardware maintenance schedule
- Oozie scheduling for administrators
- Securing your cluster with Kerberos
- The future of Hadoop
Who should attend
The course is highly recommended for –
- Hadoop administrators
- Software administrators
- System administrators
Prerequisites
Participants need to be familiar with navigating the Linux command-line and have a basic knowledge of Linux editor, such as, VI/nano, etc. for editing code.