Overview
The open source distributed database management system – Cassandra, is designed to provide high availability with no point of failure while handling massive data sizes across various commodity servers.
What You'll Learn
The Cassandra training course is designed to build a deep understanding of Apache Cassandra for processing very large volumes of data streaming at high speeds to retrieving valuable insights from this data.
Duration: 24 Hours
Curriculum
- A brief into NoSQL
- CAP theorem
- When to use NoSQL
- Columnar storage
- NoSQL ecosystem
- Architecture and Design
- Cassandra nodes, clusters, datacenters
- Keyspaces, tables, rows and columns
- Partitioning, replication, tokens
- Quorum and consistency levels
- A brief into CQL
- CQL Datatypes
- Creating keyspaces & tables
- Choosing columns and types
- Choosing primary keys
- Data layout for rows and columns
- Time to live (TTL)
- Querying with CQL
- CQL updates
- Collections (list / map / set)
- Creating and using secondary indexes
- Composite keys (partition keys and clustering keys)
- Time series data
- Best practices for time series data
- Counters
- Lightweight transactions (LWT)
- Labs : creating and using indexes; modeling time series data
- Deep dive into the Cassandra design
- Sstables, memtables, commit log
- Hardware selection
- Cassandra distributions
- Cassandra Nodes Communication
- Writing and Reading data to/from the storage engine
- Data directories
- Anti-entropy operations
- Cassandra Compaction
- Choosing and Implementing compaction strategies
- Cassandra best practices for garbage collection, composition, etc
- Troubleshooting tools and tips
Prerequisites
Basic knowledge of Linux