Introduction to Hadoop Administration

Overview

This three-day introductory course focusses on helping participants gain a thorough understanding of maintaining a Hadoop cluster and its components. Compared to other cluster architectures, Hadoop s designed for massive scalability and has superior fault tolerance. The course also covers how to install, configure and maintain Hadoop on Linux in various computing environments.

What You'll Learn

Learn to install, configure and maintain the Apache Hadoop framework
Explore MapReduce, YARN and Spark
Explore Mahout and MLib as well as other frameworks
Explore Hadoop architecture (MapReduce, YARN, HDFS, Spark, Cassandra, HBase, Pig, Hive)
Install Hadoop
Test-run Hadoop programs (Explore basic tests)
Learn to optimize and performance-tune Hadoop
Explore installing Hadoop for the cloud and HBase (optional)

Curriculum

Hadoop history and concepts
Ecosystem
Distributions
High level architecture
Hadoop myths
Hadoop challenges (hardware / software)

Selecting software and Hadoop distributions
Sizing the cluster and planning for growth
Selecting hardware and network
Rack topology
Installation
Multi-tenancy
Directory structure and logs
Benchmarking

Concepts (horizontal scaling, replication, data locality, rack awareness)
Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
Health monitoring
Command-line and browser-based administration
Adding storage and replacing defective drives

Parallel computing before MapReduce: compare HPC versus Hadoop administration
MapReduce cluster loads
Nodes and Daemons (JobTracker, TaskTracker)
MapReduce UI walk through
MapReduce configuration
Job config
Job schedulers
Administrator view of MapReduce best practices
Optimizing MapReduce
Fool proofing MR: what to tell your programmers
YARN: architecture and use

Hardware monitoring
System software monitoring
Hadoop cluster monitoring
Adding and removing servers and upgrading Hadoop
Backup, recovery, and business continuity planning
Cluster configuration tweaks
Hardware maintenance schedule
Oozie scheduling for administrators
Securing your cluster with Kerberos
The future of Hadoop

Who should attend

The course is highly recommended for –

Hadoop administrators
Software administrators
System administrators

Prerequisites

Participants need to be familiar with navigating the Linux command-line and have a basic knowledge of Linux editor, such as, VI/nano, etc. for editing code.

Introduction to Hadoop Administration

Overview

What You'll Learn

Curriculum

Who should attend

Prerequisites

Interested in this Course?

Ready to recode your DNA for GenAI?
Discover how Cognixia can help.

Generative AI - Rewire

Generative AI - Organization

JUMP

Digital Mindset & Culture

Change & Adoption

REWIRE

Organization Transformation

Introduction to Hadoop Administration

Overview

What You'll Learn

Curriculum

Introduction

Planning and installation

HDFS operations

MapReduce operations

Advanced topics

Who should attend

Prerequisites

Interested in this Course?

Ready to recode your DNA for GenAI? Discover how Cognixia can help.

Ready to recode your DNA for GenAI?
Discover how Cognixia can help.