Index types and use cases

The data stored in CQL tables is queried through primary and secondary indexes:

Primary indexing

The main method for querying tables is primary indexing, which uses a table’s defined partition key. This is any of the typical SELECT statements that you might use on a primary key column.

All tables support primary indexing queries because all tables have a partition key and a primary index.

The Cassandra storage engine uses the partition key to store rows of data, and the most efficient and fastest data lookups are matches on the partition key.

Secondary (auxiliary) indexing

To query non-primary key columns efficiently, you must create secondary indexes of those columns. Secondary indexing uses fast, efficient lookup of data that matches a given condition, but it requires storage overhead and deliberate creation and maintenance of the indexes.

Secondary indexes are optional and must be created explicitly. Don’t create secondary indexes for every column; only index columns relevant to the queries expected by your data model.

When compared to non-indexed tables, tables with secondary indexes typically experience significantly higher latency and lower throughput. In some cases, doubled latency and halved throughput. Often, system resources must be adjusted to provide additional capacity for the indexes.

Most types of secondary indexing work best when there is a moderate cardinality of the indexed values, meaning there are a variety of identical and unique values in the rows, but the rows aren’t excessively unique. The more unique values that exist in a particular column, the more overhead, on average, is required to query and maintain the index. For example, indexing a column where almost every value is different is typically inefficient and resource intensive. In contrast, a boolean column with only two possible values is not useful for queries.

In most cases, secondary indexes are like filters where you want the filter to be diverse enough to be useful, but not so specific that it isn’t reusable for different queries.

Supported index types

There are several types of secondary indexing available, but they aren’t interchangeable. Furthermore, not all secondary index types are supported by every database platform.

Indexing support by product
Indexing type	Astra DB Serverless	Astra Managed Clusters	Apache Cassandra®	DataStax Enterprise (DSE)	Hyper-Converged Database (HCD)
Primary indexing (primary key)	Supported	Supported	Supported	Supported	Supported
Storage-attached indexing (SAI)	Supported	Supported	Supported (5.0)	Supported (6.8, 6.9)	Supported
Original secondary indexing (2i)	Not supported	Supported	Supported	Supported	Supported
SSTable-attached indexing (SASI)	Not supported	Not supported	Experimental, not recommended	Experimental (5.1, 6.8), not recommended	Not supported
DSE Search indexing	Not supported	Supported	Not supported	Supported	Not supported

Secondary indexing (2i)

2i indexes, also known as index lookups, are the original built-in indexing method for Cassandra. These indexes are a local index, stored in an internal (hidden) table on each node of a cluster, separate from the table that contains the values being indexed.

Due to potential performance degradation, 2i is only recommended when used in conjunction with a partition key.

Don’t use 2i index in the following situations:

On high-cardinality columns to a query of a huge volume of records for a small number of results. See Problems using 2i for a high-cardinality column.
In tables that use a counter column.
On columns with frequently updated or deleted values. See Problems using 2i for a frequently updated or deleted column.
For broad queries on large partitions. See Problems using 2i to look for a row in a large partition unless narrowly queried.
Queries using non-equality or range operators.

Indexing is never free: The more you add, the more you impact write performance. In Cassandra databases, this manifests as write-amplification issues. When a mutation on an indexed column occurs, an indexing operation triggers reindexing of data in a separate index file.

More indexes on a table can dramatically increase disk activity during write operations. If a single node gets too many writes, I/O saturation can occur. This destabilizes individual nodes, creating cluster-wide performance issues.

For this reason, 2i should be used sparingly. Index size is fairly linear, but it can be difficult to plan for the amount of disk space needed in an active cluster for storing and reindexing indexes.

For more information about creating and using 2i, see Create and use secondary indexes (2i).

SSTable-attached indexing (SASI)

SASI is not recommended. SASI indexes are deprecated in DSE 6.8 and 5.1, but they are enabled by default.

SASI wasn’t designed as a general indexing method. It was an alternative implementation of 2i that was designed for a specific use case (LIKE full-text searches) on an early version of Cassandra with a deprecated API. SASI received limited testing, and it was known to have numerous issues. Despite fixes and improvements in later versions, it is considered unreliable and risks returning inconsistent results.

SASI uses indexes for non-partition columns, and it creates an index file for each SSTable that store the rows of data. For more information about creating and using SASI, see Create SASI index.

DSE Search indexing

DSE Search isn’t required, but it can be useful for certain full-text search applications. If you are using DSE Search, use DSE Search indexing for your full text search queries.

The DSE Search indexing method is specific to Apache Solr-Lucene searches because DSE Search supports simple keyword searches as well as complex queries on multiple fields with faceted search results, such as full-text search, range search, and exact search. This indexing method features tokenized text search for use with analyzers.

For more information, see Search index commands.

Build and maintain secondary indexes

After you create an index on a table column, the index is built in the background automatically without blocking reads or writes.

Client-maintained tables as indexes must be created manually. For example, if an age column was indexed by creating a table named by_age, then your client application must populate the table with supporting data from other tables, such as a name table that uses id as the primary key.

To perform a hot rebuild of an index, use the nodetool rebuild_index command.

Index types and use cases

Supported index types

Secondary indexing (2i)

SSTable-attached indexing (SASI)

DSE Search indexing

Build and maintain secondary indexes

See also

Was this helpful?

Give Feedback