• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

CQL for DSE

  • Overview
    • Cassandra structure
    • About CQL
    • CQL quick reference
    • Known issues
  • Getting started
    • Vector Search Quickstart
    • CQL Quickstart
    • SAI Quickstart
    • DSE Search Indexing Quick Start
  • Planning
    • Data modeling
      • Data modeling concepts
      • Data modeling analysis
  • Connecting
    • cqlsh (startup options)
      • Accessing data using CQL
        • Starting the standalone cqlsh tool
        • Starting cqlsh
    • Developing
      • Introduction to CQL
      • Managing keyspaces
        • Keyspace concepts
        • Create a keyspace
        • Check a keyspace
        • Alter a keyspace
        • Drop a keyspace
      • Managing tables
        • Table concepts
        • Data types
        • Create a table
        • Check a table
        • Alter a table
        • Truncate a table
        • Drop a table
      • Managing columns
        • Managing columns
        • Create a table column
        • Check column existence
        • Alter a table column
        • Drop a column
        • User-defined type (UDT) overview
          • Create a user-defined type (UDT)
          • Check for user-defined type (UDT) existence
          • Alter a user-defined type (UDT)
          • Drop a user-defined type (UDT)
        • User-defined functions (UDFs) and user-defined aggregates (UDAs) overview
          • Create a user-defined function (UDF)
          • Check a user-defined function (UDF)
          • Drop a user-defined function (UDF)
          • Create a user-defined aggregate function (UDA)
          • Checking a user-defined aggregate (UDA)
          • Drop a user-defined aggregate (UDA)
      • Managing Materialized views
        • Materialized views concepts
        • Known limitations of materialized views
        • Create a materialized view (MV)
        • Alter a materialized view (MV)
        • Drop a materialized view
        • Frequently asked questions about materialized views
      • Indexing Tables
        • Indexing concepts
        • Storage-Attached Indexing (SAI) Overview
          • SAI concepts
          • SAI Quickstart
          • SAI FAQ
          • Create SAI index
          • Alter SAI index
          • Drop SAI index
          • Querying with SAI
          • Configure SAI indexes
          • Monitor SAI indexes
          • SAI write path and read path
        • Secondary indexes (2i) overview
          • Secondary index (2i) concepts
          • When to use an index
          • Building and maintaining secondary indexes (2i)
          • Create a secondary index (2i)
            • Create a 2i on a collection column
          • Check secondary index (2i) existence
          • Alter a secondary index (2i)
          • Drop a secondary index (2i)
        • Search indexes overview
          • Search index commands
          • Adjusting timeout for index management
          • Creating a search index
          • Configuring search index joins
          • Joining cores
          • Reloading the search index
          • Removing a search index
          • Managing search index fields
            • Syntax for changing schema settings
            • Search index schema field types
            • Adding a new field type
            • Adding a column to the index
            • Indexing tuples and UDTs fields
              • Tuple configuration example
              • UDT configuration example
              • Nesting tuples and UDTs
              • Tuples and UDTs as CQL map values
            • Indexing map columns
            • Indexing a column for different analysis
            • Dropping columns from the index
          • Updating the index after data expires (TTL)
          • Inserting, updating, and deleting data
          • Search indexing examples
        • Search index examples
          • Search index examples
            • Creating a demo keyspace for tutorials
            • Indexing and querying polygons
      • Inserting and updating data
        • Insert simple data
        • Set column
        • List fields
        • Map fields
        • Tuple
        • User-defined type (UDT) column
        • Insert static data
        • UUID and timeuuid column
        • Date column
        • Time column
        • Timestamp column
        • Duration column
        • Update counter
        • Lightweight transactions (LWTs)
        • Inserting JSON formatted values
        • Batching inserts and updates
          • Batching inserts, updates, and deletes
          • Good use of BATCH statement
          • Misuse of BATCH statement
        • Importing data from a CSV file
        • Setting the time-to-live (TTL) for value
          • Inserting, updating, and deleting data
      • Deleting data
        • Deleting values from a column or entire row
        • Expiring data with time-to-live
        • Expiring data with TTL example
      • Querying and retrieving data
        • Using a SELECT statement
        • Restricting queries using WHERE clauses
          • Partition keys
          • Clustering columns
          • Retrieval using the IN keyword
          • Sorting and limiting results
        • Returning data from collection columns
        • Aggregating results
        • Using user-defined functions (UDF)
        • Applying user-defined aggregate (UDA) functions
          • Testing performance impact using tracing
        • Formatting query results as JSON
        • Retrieving the write timestamp
        • Using tracing
          • Tracing performance
        • DSE Search index querying
        • Search index filtering best practices
          • Search index syntax
          • search-index:si-query-syntax.adoc#siIndexedColumns
          • search-index:si-query-syntax.adoc#basicSolrSyntax
          • search-index:si-query-syntax.adoc#queriesJSON
          • search-index:si-query-syntax.adoc#siEscapeCharacters
        • Native CQL search queries
        • Identifying the partition key
        • Filtering on terms
          • Filtering on words, phrases, or substrings
          • Fuzzy search
        • UDT query examples
        • Querying a CQL collection set
        • Using date ranges in solr_query
        • Geospatial queries for Point and LineString
        • Spatial queries with polygons require JTS
        • Using dynamic fields
        • Limiting results and paging
        • Restricted query routing
    • Securing overview
      • Security quickstart
      • Using cqlsh with authentication
      • System tables for authentication and authorization
  • Cycling Examples
    • Setting up the cycling keyspace
    • User-defined types (UDTs)
      • basic_info
      • basic_info_withTTL
      • fullname
      • race
    • User-defined functions (UDFs) and user-defined aggregates (UDAs)
      • flog
      • left
      • average_state
      • average_final
      • Cycling user-defined aggregate team_average
    • Tables
      • birthday_list
      • calendar
      • comments
      • country_flag
      • cyclist_alt_stats
      • cyclist_base
      • cyclist_career_teams
      • cyclist_category
      • cyclist_emails
      • cyclist_expenses
      • cyclist_id
      • cyclist_name
      • cyclist_points
      • cyclist_races
      • cyclist_sponsors
      • cyclist_stats
      • cyclist_teams
      • events
      • popular_count
      • race_sponsors
      • race_starts
      • race_times
      • race_winners
      • rank_by_year_and_name
      • team_average
      • upcoming_calendar
    • Materialized views (MVs)
      • cyclist_by_age-mv
    • Search indexes
      • Search index examples
      • comments search index
    • SASI indexes
      • Index on cyclist_name
    • Access control
      • Cycling internal
    • Cycling queries
      • Aggregating cycling data
    • Reference
      • CQL quick reference
      • Understanding the CQL command syntax
      • Data types
        • Creating a blob column
        • Date, time, and timestamp format
        • Solr field type reference for DSE Search
      • Operators
      • CQL native functions
      • CQL native aggregates
      • System and virtual tables
        • System tables
          • Keyspace tables
          • Querying system tables
          • Functions, aggregates, and user types
        • Virtual tables
          • Virtual keyspaces and tables
          • Virtual tables for SAI indexes and SSTables
      • cqlsh (startup options)
        • Configuring a cqlshrc file
        • Starting cqlsh on a search node
      • CQL shell (cqlsh) reference
        • CAPTURE
        • CLEAR
        • CONSISTENCY
        • COPY TO
        • COPY FROM
        • DESCRIBE AGGREGATE
        • DESCRIBE CLUSTER
        • DESCRIBE SCHEMA
        • DESCRIBE FUNCTION
        • DESCRIBE INDEX
        • DESCRIBE KEYSPACE
        • DESCRIBE MATERIALIZED VIEW
        • DESCRIBE SEARCH INDEX
        • DESCRIBE TABLE
        • DESCRIBE TYPE
        • EXECUTE AS
        • EXPAND
        • EXIT
        • LOGIN
        • PAGING
        • SERIAL CONSISTENCY
        • SHOW
        • SOURCE
        • TIMING
        • TRACING
      • CQL commands
        • ALTER KEYSPACE
        • ALTER MATERIALIZED VIEW
        • ALTER ROLE
        • ALTER SEARCH INDEX CONFIG
        • ALTER SEARCH INDEX SCHEMA
        • ALTER TABLE
        • ALTER TYPE
        • ALTER USER (Deprecated)
        • BATCH
        • COMMIT SEARCH INDEX
        • CREATE AGGREGATE
        • CREATE CUSTOM INDEX
        • CREATE FUNCTION
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE MATERIALIZED VIEW
        • CREATE ROLE
        • CREATE SEARCH INDEX
        • CREATE TABLE
        • CREATE TYPE
        • CREATE USER (Deprecated)
        • DELETE
        • DROP AGGREGATE
        • DROP FUNCTION
        • DROP INDEX
        • DROP KEYSPACE
        • DROP MATERIALIZED VIEW
        • DROP ROLE
        • DROP SEARCH INDEX
        • DROP TABLE
        • DROP TYPE
        • DROP USER (Deprecated)
        • GRANT
        • INSERT
        • LIST PERMISSIONS
        • LIST ROLES
        • LIST USERS (Deprecated)
        • REBUILD SEARCH INDEX
        • RELOAD SEARCH INDEX
        • RESTRICT
        • RESTRICT ROWS
        • REVOKE
        • SELECT
        • TRUNCATE
        • UNRESTRICT
        • UNRESTRICT ROWS
        • UPDATE
        • USE
  • CQL for DSE
  • Developing
  • Managing tables
  • Table concepts
Edit this Page

Table concepts

A table is a database object that stores the data. It is analogous to a SQL table, although the data is stored much differently in Apache Cassandra. Tables can be created, modified, or dropped at runtime without blocking updates or queries.

Tables are defined as rows of columns. Each column has a required unique data type defined at table creation. Rows are defined by the choice of which column or columns are defined as the primary key of the table. Data is stored in a cell which is the intersection of a row and column.

Column characteristics

Column definition comprises one part of table definition. The columns are defined in a comma-delimited list of <column-name> <data-type> pairs. A common column definition is:

username text,

Data types

Most data types are straightforward in definition, and are similar to the common data types in other languages.

General data type

CQL data types

Characters

ascii, text, or varchar

Integers

int, tinyint, smallint, bigint, or varint

Vectors

Array type of float32

Decimals

decimal, float, double

Date and time

date, DateRangeType, duration, time, timestamp

Unique identifiers

uuid, timeuuid

Specialized

blob, boolean, counter

Geospatial

PointType, LineStringType, PolygonType

Collections - frozen or non-frozen

list, set, map, tuple, user-defined type (UDT)

See the data type reference for more detailed information.

Primary Key

The other part of a table definition that must be included is the primary key. It is declared either within a column for a table with a single partition key:

username text PRIMARY KEY,

Or, the definition can be near the end of the table creation command with the PRIMARY KEY command:

PRIMARY KEY (username, age)

A primary key in Cassandra consists of one or more partition keys and zero or more clustering column components. The order of these components in the definition always puts the partition key first and then the clustering column or columns.

The primary key is defined when the table is created and cannot be altered. If you must change the primary key, you’ll need create a new table schema and write the existing data to the new table.

The definition of a table’s primary key is critical. Carefully model how data in a table will be inserted and retrieved before choosing which columns to define in the primary key. When selecting the table’s primary key, consider:

  • the size of the partitions

  • the order of the data within partitions

  • the distribution of the partitions among the nodes of the cluster

  • the fact that a primary key cannot have a NULL value

Partition key

Cassandra is a partition row store, so the first element of the primary key, the partition key, specifies which node in the Cassandra cluster will replicate a particular table row. Thus, the primary key identifies the location and order of stored data.

At the minimum, the primary key must consist of a partition key. One or more columns are used to define a partition key. If one column, such as age is defined as the partition key, then all 35 year olds will be sorted into the same storage location.

In contrast to a simple partition key, a composite partition key uses two or more columns to identify where data resides. If more than one column is specified, the composite partition key splits a data set so that related data is stored on separate partitions.

Importance of the partition key

Data is retrieved using the partition key. Keep in mind that to retrieve data from the table, values for all columns defined in the partition key have to be supplied unless an index is created.

The database stores an entire row of data on a node by partition key. If there is too much data in a single partition and data needs to be spread among multiple nodes, use a composite partition key. Using more than one column for the partition key breaks the data into chunks, or buckets. These columns form logical sets inside a partition to facilitate retrieval. The data is still grouped, but in smaller chunks.

This method can be effective if a cluster experiences hotspotting, or congestion in writing data to one node repeatedly, because a partition is heavily writing. Cassandra is often used for time series data, and hotspotting can be a real issue.

For example, if you want to sort data by both age and name, specifying both columns as part of the partition key will store rows that store all the 35 year old Janes into a similar storage location, whether they live in Seattle, Chicago, or Boston.

Clustering column

The second part of a primary key is optional, and consists of one or more clustering columns. Clustering columns order the data so that multiple rows within a single partition are clustered in a defined order. The clustering columns do not dictate the storage location of the data, only the order of the data within a partition. However, this feature can be very useful, for example, if the data stores time sequences, or groups of network equipment by region.

If a clustering column is specified in the primary key, the primary key is a compound primary key. The table is comprised of multi-row partitions.

Importance of clustering columns

Using a compound primary key creates a table that can be queried to return sorted results. If the pro cycling example was designed for a relational database, a cyclists table would be created with a foreign key to JOIN with the races table when querying. In Cassandra, the data is denormalized because joins are not performant in a distributed system.

Grouping data in tables using a clustering column or columns is analogous to JOINs in a relational database, but clustering columns are much more performant because only one table is accessed. On a physical node, when rows for a partition key are stored in order based on the clustering columns, retrieval of rows is very efficient.

Managing tables Create a table

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage