Overview
Cassandra (C*) is a massively scalable NoSQL database that provides high availability and fault tolerance, as well as linear scalability when adding new nodes to a cluster. This course provides an in-depth introduction to working with Cassandra and using it create effective data models, while focusing on the practical aspects of working with C*. The course covers important topics such as internal architecture for making sound decisions, CQL (Cassandra Query Language) as well as Java APIs for writing Cassandra clients.
What You'll Learn
- Understand the needs addressed by C*
- Be familiar with the operation and structure of C*
- Be able to install and set up a C* database
- Use the C* tools, including cqlsh, nodetool and CCM (Cassandra Cluster Manager)
- Familiarize with C* architecture and how a C* cluster is structured
- Understand how data is distributed and replicated in a C* cluster
- Understand core C* data modelling concepts and use them to create well-structured data models
- Use data replication and eventual consistency intelligently
- Understand and use CQL to create tables and query for data
- Know and use the CQL data types (numerical, textual, uuid, etc.)
- Understand the various kinds of primary keys available (simple, compound and composite primary keys)
- Use more advanced capabilities like collections, counters, secondary indexes, CAS (Compare and Set), static columns and batches
- Familiarize with the Java client API
- Use the Java client API to write client programs that work with C*
- Build and use dynamic queries with QueryBuilder
- Understand and use asynchronous queries with the Java API
Curriculum
- Why we need Cassandra
- High level Cassandra overview
- Cassandra features
- Basic Cassandra installation and configuration
- Cassandra architecture overview
- Cassandra clusters and rings
- Data replication in Cassandra
- Cassandra consistency/eventual consistency
- Introduction to CQL
- Defining tables with a single primary key
- Using cqlsh for interactive querying
- Selecting and inserting/upserting data with CQL
- Data replication and distribution
- Basic data types (including uuid, timeuuid)
- Defining a compound primary key
- CQL for compound primary keys
- Partition keys and data distribution
- Clustering columns
- Overview of internal data organization
- Additional querying capabilities
- Result ordering – ORDER BY and CLUSTERING ORDER BY
- UPDATE and DELETE queries
- Result filtering, ALLOW FILTERING
- Batch queries
- Data modelling guidelines
- Denormalization
- Data modelling workflow
- Data modelling principles
- Primary key considerations
- Composite partition keys
- Defining with CQL
- Data distribution with composite partition keys
- Overview of internal data organization
- Indexing
- Primary/partition keys and pagination with token()
- Secondary indexes and usage guidelines
- Cassandra counters
- Counter structure and definition
- Using counters
- Counter limitations
- Cassandra collections
- Collection structure and uses
- Defining collections (set, list, and map)
- Querying collections (including insert, Update, Delete)
- Limitations
- Overview of internal storage organization
- Static column – overview and usage
- Static column guidelines
- Materialized view: Overview and usage
- Materialized view guidelines
- Overview of consistency in Cassandra
- CAP theorem
- Eventual (tunable) consistency in C* – One, Quorum, All
- Choosing CL One
- Choosing CL Quorum
- Achieving immediate consistency
- Using other consistency levels
- Internal repair mechanisms (Read repair, hinted handoff)
- Overview of lightweight transactions
- Using LWT, the [applied] column
- IF EXISTS, IF NOT EXISTS, Other IF conditions
- Basic CAS internals
- Overhead and guidelines
- Dealing with Write failure
- Unavailable Node and NodeFailure
- Requirements for Write operations
- Key and row caches
- Cache overview
- Usage guidelines
- Multi-data center support
- Overview
- Replication factor configuration
- Additional Consistency Levels – LOCAL/EACH QUORUM
- Deletes
- CQL for Deletion
- Tombstones
- Usage Guidelines
- API Overview
- Introduction
- Architecture and Features
- Connecting to a Cluster
- Cluster and Cluster.Builder
- Contact Points, Connecting to a Cluster
- Session Overview and API
- Working with Sessions
- The Query API
- Overview
- Dynamic Queries, Statement, SimpleStatement
- Processing Query Results, ResultSet, Row
- PreparedStatement, BoundStatement
- Binding Values and Querying with PreparedStatements
- CQL to Java Type Mapping
- Working with UUIDs
- Working with Time/Date Values
- Working with Batches of SimpleStatement and PreparedStatement
- Dynamic Queries and QueryBuilder
- QueryBuilder Overview and API
- Building SELECT, DELETE, INSERT, and UPDATE Queries
- Creating WHERE Clauses
- Other Query Examples
- Configuring Query Behavior
- Setting LIMIT and TTL
- Working with Consistency
- Using LWT
- Working with Driver Policies
- Load Balancing Policies – RoundRobinPolicy, DCAwareRoundRobinPolicy
- Retry Policies – DefaultRetryPolicy, DowngradingConsistencyRetryPolicy, Other Policies
- Reconnection Policies
- Asynchronous Querying Overview
- Synchronous vs. Asynchronous Querying
- Executing Asynchronous Queries
- util.concurrent.Future
- Cassandra ResultSetFuture
Who should attend
The course is highly recommended for –
- Java developers
- Database administrators
- Spring developers
- Architects
- Full stack developers/engineers
- DevOps developers/engineers