• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

CQL for Astra DB Serverless

  • Overview
    • Cassandra structure
    • About CQL
    • CQL quick reference
    • Known issues
  • Getting started
    • Vector Search Quickstart
    • CQL Quickstart
    • SAI Quickstart
  • Planning
    • Data modeling
      • Data modeling concepts
      • Data modeling analysis
  • Connecting
    • cqlsh (startup options)
      • Accessing data using CQL
        • Starting the standalone cqlsh tool
        • Starting cqlsh
    • Developing
      • Introduction to CQL
      • Managing keyspaces
        • Keyspace concepts
        • Check a keyspace
      • Managing tables
        • Table concepts
        • Data types
        • Create a table
        • Check a table
        • Alter a table
        • Truncate a table
        • Drop a table
      • Managing columns
        • Managing columns
        • Create a table column
        • Check column existence
        • Alter a table column
        • Drop a column
        • User-defined type (UDT) overview
          • Create a user-defined type (UDT)
          • Check for user-defined type (UDT) existence
          • Alter a user-defined type (UDT)
          • Drop a user-defined type (UDT)
      • Indexing Tables
        • Indexing concepts
        • Storage-Attached Indexing (SAI) Overview
          • SAI concepts
          • SAI Quickstart
          • SAI FAQ
          • Create SAI index
          • Alter SAI index
          • Drop SAI index
          • Querying with SAI
          • SAI write path and read path
        • Secondary indexes (2i) overview
          • Secondary index (2i) concepts
          • When to use an index
          • Building and maintaining secondary indexes (2i)
          • Create a secondary index (2i)
            • Create a 2i on a collection column
          • Check secondary index (2i) existence
          • Alter a secondary index (2i)
          • Drop a secondary index (2i)
      • Inserting and updating data
        • Insert simple data
        • Set column
        • List fields
        • Map fields
        • Tuple
        • User-defined type (UDT) column
        • Insert static data
        • UUID and timeuuid column
        • Date column
        • Time column
        • Timestamp column
        • Duration column
        • Update counter
        • Lightweight transactions (LWTs)
        • Inserting JSON formatted values
        • Batching inserts and updates
          • Batching inserts, updates, and deletes
          • Good use of BATCH statement
          • Misuse of BATCH statement
        • Importing data from a CSV file
        • Setting the time-to-live (TTL) for value
          • Inserting, updating, and deleting data
      • Deleting data
        • Deleting values from a column or entire row
        • Expiring data with time-to-live
        • Expiring data with TTL example
      • Querying and retrieving data
        • Using a SELECT statement
        • Restricting queries using WHERE clauses
          • Partition keys
          • Clustering columns
          • Retrieval using the IN keyword
          • Sorting and limiting results
        • Returning data from collection columns
        • Aggregating results
        • Using user-defined functions (UDF)
        • Applying user-defined aggregate (UDA) functions
          • Testing performance impact using tracing
        • Formatting query results as JSON
        • Retrieving the write timestamp
        • Using tracing
          • Tracing performance
        • DSE Search index querying
        • Search index filtering best practices
          • Search index syntax
          • search-index:si-query-syntax.adoc#siIndexedColumns
          • search-index:si-query-syntax.adoc#basicSolrSyntax
          • search-index:si-query-syntax.adoc#queriesJSON
          • search-index:si-query-syntax.adoc#siEscapeCharacters
        • Native CQL search queries
        • Identifying the partition key
        • Filtering on terms
          • Filtering on words, phrases, or substrings
          • Fuzzy search
        • UDT query examples
        • Querying a CQL collection set
        • Using date ranges in solr_query
        • Geospatial queries for Point and LineString
        • Spatial queries with polygons require JTS
        • Using dynamic fields
        • Limiting results and paging
        • Restricted query routing
    • Securing overview
      • Security quickstart
      • Using cqlsh with authentication
      • System tables for authentication and authorization
  • Cycling Examples
    • Setting up the cycling keyspace
    • User-defined types (UDTs)
      • basic_info
      • basic_info_withTTL
      • fullname
      • race
    • User-defined functions (UDFs) and user-defined aggregates (UDAs)
      • flog
      • left
      • average_state
      • average_final
      • Cycling user-defined aggregate team_average
    • Tables
      • birthday_list
      • calendar
      • comments
      • country_flag
      • cyclist_alt_stats
      • cyclist_base
      • cyclist_career_teams
      • cyclist_category
      • cyclist_emails
      • cyclist_expenses
      • cyclist_id
      • cyclist_name
      • cyclist_points
      • cyclist_races
      • cyclist_sponsors
      • cyclist_stats
      • cyclist_teams
      • events
      • popular_count
      • race_sponsors
      • race_starts
      • race_times
      • race_winners
      • rank_by_year_and_name
      • team_average
      • upcoming_calendar
    • Materialized views (MVs)
      • cyclist_by_age-mv
    • Search indexes
      • Search index examples
      • comments search index
    • Access control
      • Cycling internal
    • Cycling queries
      • Aggregating cycling data
    • Reference
      • CQL quick reference
      • Understanding the CQL command syntax
      • Data types
        • Creating a blob column
        • Date, time, and timestamp format
      • Operators
      • CQL native functions
      • CQL native aggregates
      • System and virtual tables
        • System tables
          • Keyspace tables
          • Querying system tables
          • Functions, aggregates, and user types
        • Virtual tables
          • Virtual keyspaces and tables
          • Virtual tables for SAI indexes and SSTables
      • cqlsh (startup options)
        • Configuring a cqlshrc file
        • Starting cqlsh on a search node
      • CQL shell (cqlsh) reference
        • CAPTURE
        • CLEAR
        • CONSISTENCY
        • COPY TO
        • COPY FROM
        • DESCRIBE AGGREGATE
        • DESCRIBE CLUSTER
        • DESCRIBE SCHEMA
        • DESCRIBE FUNCTION
        • DESCRIBE INDEX
        • DESCRIBE KEYSPACE
        • DESCRIBE MATERIALIZED VIEW
        • DESCRIBE SEARCH INDEX
        • DESCRIBE TABLE
        • DESCRIBE TYPE
        • EXECUTE AS
        • EXPAND
        • EXIT
        • LOGIN
        • PAGING
        • SERIAL CONSISTENCY
        • SHOW
        • SOURCE
        • TIMING
        • TRACING
      • CQL commands
        • ALTER TABLE
        • ALTER TYPE
        • ALTER USER (Deprecated)
        • BATCH
        • CREATE AGGREGATE
        • CREATE CUSTOM INDEX
        • CREATE INDEX
        • CREATE TABLE
        • CREATE TYPE
        • CREATE USER (Deprecated)
        • DELETE
        • DROP INDEX
        • DROP TABLE
        • DROP TYPE
        • DROP USER (Deprecated)
        • GRANT
        • INSERT
        • RESTRICT
        • REVOKE
        • SELECT
        • TRUNCATE
        • UNRESTRICT
        • UNRESTRICT ROWS
        • UPDATE
        • USE
  • CQL for Astra DB Serverless
  • Planning
  • Data modeling
  • Data modeling concepts
Edit this Page

Data modeling concepts

Data modeling is a process that involves identifying the entities (items to be stored) and the relationships between entities.

To create your data model, identify the patterns used to access data and the types of queries to be performed. These two ideas inform the organization and structure of the data, and the design and creation of the database’s tables. Indexing the data can lead to either performance or degradation of queries, so understanding indexing is an important step in the data modeling process.

Data modeling in Cassandra uses a query-driven approach, in which specific queries are the key to organizing the data. Queries are the result of selecting data from a table; schema is the definition of how data in the table is arranged. Cassandra’s database design is based on the requirement for fast reads and writes, so the better the schema design, the faster data is written and retrieved.

In contrast, relational databases normalize data based on the tables and relationships designed, and then writes the queries that will be made. Data modeling in relational databases is table-driven, and any relationships between tables are expressed as table joins in queries.

Cassandra’s data model is a partitioned row store with tunable consistency. Tunable consistency means for any given read or write operation, the client application decides how consistent the requested data must be. Rows are organized into tables. The first component of a table’s primary key is the partition key. Within a partition, rows are clustered by the remaining columns of the key. Other columns can be indexed separately from the primary key. Because Cassandra is a distributed database, efficiency is gained for reads and writes when data is grouped together on nodes by partition. The fewer partitions that must be queried to get an answer to a question, the faster the response. Tuning the consistency level is another factor in latency, but is not part of the data modeling process.

Cassandra data modeling focuses on the queries. Throughout this topic, the example of Pro Cycling statistics demonstrates how to model the table schema for specific queries. The conceptual model for this data model shows the entities and relationships.

cqlCyclistConceptual

The entities and their relationships are considered during table design. Queries are best designed to access a single table, so all entities involved in a relationship that a query encompasses must be in the table. Some tables will involve a single entity and its attributes, like the first example shown below. Others will involve more than one entity and its attributes, such as the second example. Including all data in a single table contrasts with a relational database approach, where the data would be stored in two or more tables and foreign keys would be used to relate the data between the tables. Because Cassandra uses this single table-single query approach, queries can perform faster.

One basic query (Q1) for Pro Cycling statistics is a list of cyclists, including each cyclist’s id, firstname, and lastname. To uniquely identify a cyclist in the table, an id using UUID is used. For a simple query to list all cyclists a table that includes all the columns identified and a partition key (K) of id is created. The diagram below shows a portion of the logical model for the Pro Cycling data model.

cqlCyclistLogicalQuery1

A related query (Q2) searches for all cyclists by a particular race category. This query is more efficient if a table is created that groups all cyclists by category. Some of the same columns are required (id, lastname), but now the primary key of the table includes category as the partition key (K), and groups within the partition by the id ©. This choice ensures that unique records for each cyclist are created.

cqlCyclistLogicalQuery2

These are two simple queries; more examples will be shown to illustrate data modeling using CQL.

Notice that the main principle in designing the table is not the relationship of the table to other tables, as it is in relational database modeling. Data in Cassandra is often arranged as one query per table, and data is repeated amongst many tables, a process known as denormalization. Relational databases instead normalize data, removing as much duplication as possible. The relationship of the entities is important, because the order in which data is stored can greatly affect the ease and speed of data retrieval. The schema design captures much of the relationship between entities by including related attributes in the same table. Client-side joins in application code is used only when table schema cannot capture the complexity of the relationships.

Data modeling Data modeling analysis

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage