Developing with Spark for Big Data| Enterprise-Grade Spark Programming for the Hadoop & Big Data Ecosystem

Overview

The Developing with Spark for Big Data course introduces participants to enterprise-grade Spark programming, covering intermediate-level and advanced-level concepts of Spark programming, and enabling participants to work with the key components of Apache Spark for developing data science solutions. The course equips participants with the skills and knowledge to work with Apache Spark in real-world enterprises and execute effective data-driven decisions. The course involves various hands-on exercises to ensure that participants have a thorough understanding of all the concepts covered in the course.

What You'll Learn

Basics of Spark architecture and applications
Executing Spark programs
Creating and manipulating both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)
Restoring data frames
Essential NoSQL access
Integrating machine learning into Spark applications
Sing Spark streaming and Kafka to create streaming applications

Curriculum

Hadoop ecosystem
Hadoop YARN vs. Mesos
Spark vs. MapReduce
Spark with MapReduce – Lambda Architecture
Spark in the enterprise data science architecture

Spark Shell
RDDs: Resilient Distributed Datasets
Data frames
Spark 2 unified DataFrames
Spark sessions
Functional programming
Spark SQL
MLib
Structured streaming
Spark R
Spark and Python

Coding with RDDs
Transformation
Actions
Lazy evaluation and optimization
RDDs n MapReduce

RDDs vs. DataFrames
Unified DataFrames (UDF) in Spark 2.0
Partitioning

Spark sessions
Running applications
Logging

RDD persistence
DataFrame and Unified DataFrame persistence
Distributed Persistence

Streaming overview
Streams
Structured streaming
DStreams and Apache Kafka

Ingesting data
Parquet files
Relational databases
Graph databases (Neo4J, GraphX)
Interacting with Hive
Accessing Cassandra data
Document databases (MongoDB, CouchDB)

MapReduce and Lambda integration
Camel integration
Drools and Spark

MLib and Mahout
Classification
Clustering
Decision trees
Decompositions
Pipelines
Spark packages

Spark SQL
SQL and DataFrames
Spark SQL and Hive
Spark SQL and JDBC

Graph APIs
GraphX
ETL in GraphX
Exploratory analysis
Graph computation
Pregel APi overview
GraphX algorithms
Neo4J as an alternative

Using web notebooks (Zeppelin, Jupyter)
R on Spark
Python on Spark
Scala on Spark

Parallelizing Spark applications
Clustering concerns for developers

Monitoring Spark performance
Tuning memory
Tuning CPU
Tuning Data Locality
Troubleshooting

Who should attend

The course is highly recommended for –
Developers
Architects
Big Data professionals
Hadoop professionals

Prerequisites

Participants need to have experience working in a development role and have an understanding of the Big Data and Hadoop ecosystem.

Developing with Spark for Big Data| Enterprise-Grade Spark Programming for the Hadoop & Big Data Ecosystem

Overview

What You'll Learn

Curriculum

Who should attend

Prerequisites

Interested in this Course?

Ready to recode your DNA for GenAI?
Discover how Cognixia can help.

Generative AI - Rewire

Generative AI - Organization

JUMP

Digital Mindset & Culture

Change & Adoption

REWIRE

Organization Transformation

Developing with Spark for Big Data| Enterprise-Grade Spark Programming for the Hadoop & Big Data Ecosystem

Overview

What You'll Learn

Curriculum

Module 1: Overview of Spark

Module 2: Spark component overview

Module 3: RDDs: Resilient Distributed Datasets

Module 4: DataFrames

Module 5: Spark Applications

Module 6: DataFrame persistence

Module 7: Spark streaming

Module 8: Accessing NoSQL Data

Module 9: Enterprise Integration

Module 10: Algorithms and Patterns

Module 11: Spark SQL

Module 12: GraphX

Module 13: Alternate languages

Module 14: Clustering Spark for Developers

Module 15: Performance and Tuning

Who should attend

Prerequisites

Interested in this Course?

Ready to recode your DNA for GenAI? Discover how Cognixia can help.

Ready to recode your DNA for GenAI?
Discover how Cognixia can help.