Index types and use cases

The data stored in CQL tables can be queried by a variety of methods. The main method uses the partition key defined for a table, which is called primary indexing. To query non-primary key columns efficiently, you must create secondary indexes of those columns. Secondary indexing uses fast, efficient lookup of data that matches a given condition. You must create an index on a column to efficiently query the data in that column.

Indexing works best when there is a moderate cardinality of the indexed values, meaning there is a good variety of values in the rows but the rows aren’t excessively unique. The more unique values that exist in a particular column, the more overhead, on average, is required to query and maintain the index. For example, indexing a column where almost every value is different is inefficient and resource intensive. In contrast, a Boolean column with only two possible values is not useful for queries.

Think of a secondary index as a filter: You want the filter to be diverse enough to intelligently narrow the scope of the results without being so specific that it isn’t reusable.

Indexing support by product
Indexing type Astra DB Serverless Astra Managed Clusters (Classic) Apache Cassandra DataStax Enterprise Hyper-Converged Database

Primary indexing (primary key)

Supported

Supported

Supported

Supported

Supported

Storage-attached indexing (SAI)

Supported

Supported

Supported (5.0)

Supported

Supported

Original secondary indexing (2i)

Not supported

Supported

Supported

Supported

Supported

SSTable-attached indexing (SASI)

Not supported

Not supported

Experimental, not recommended

Experimental (5.1, 6.8), not recommended

Experimental, not recommended

DSE Search indexing

Not supported

Supported

Not supported

Supported

Not supported

Primary indexing

The primary index is the partition key column. All tables have a primary index.

A Cassandra storage engine uses the partition key to store rows of data, and the most efficient and fast lookup of data matches the partition key.

Secondary indexing

There are several types of secondary indexing available, but they aren’t interchangeable. Furthermore, not all secondary index types are supported by every database platform.

Secondary indexes are optional and must be created explicitly. Don’t create secondary indexes for every column; only index columns relevant to the queries expected by your data model.

Expect an increase of 2x latency and throughput on indexed tables compared to non-indexed tables.

SAI is the recommended indexing method for most use cases.

There are some use cases where 2i or DSE Search indexing are more appropriate, if supported by your database platform and required by your data model. For example, if you are using Leveled Compaction Strategy (LCS) with a small file size, you will have many SSTables, and 2i could be better than SAI.

SAI uses indexes for non-partition columns, and it attaches the indexing information to the SSTables that store the rows of data. For more information about creating and using SAI, see SAI performance and use cases.

SAI design and performance benefits

SAI was designed to be a more general indexing method that can be used for any type of query. It was also designed to address the issues with 2i and SASI, namely write amplification and index size.

SASI showed that in-memory indexing and flushing indexes with SSTables was an efficient use of the Cassandra write path. However, with SAI, the data is indexed when the mutation is acknowledged (fully committed to the table). This change, among other optimizations, vastly improved the write performance with SAI. Even compared to 2i, SAI results in a 40% increase in throughput and over 200% better write latencies.

SAI also dramatically improved index size, reducing storage requirements by 8x compared to SASI and 5x compared to 2i. This is because SAI uses two different types of indexing schemes based on the data type:

  • Text: Inverted indexes are created with terms broken into a dictionary. The biggest improvement is from the use of trie-based indexing, which offers better compression and, ultimately, smaller index sizes.

  • Numeric: Uses a data structure from Lucene called block kd-trees that offers excellent range query performance. A separate row ID list is maintained to optimize for token-order queries.

Secondary indexing (2i)

2i are the original built-in indexes for CQL. These indexes are a local index, stored in a hidden table on each node of a CQL cluster, separate from the table that contains the values being indexed.

Due to potential performance degradation, this indexing method is only recommended when used in conjunction with a partition key.

Performance degradation with inefficient or excessive use of 2i

Indexing is never free: The more you add, the more you impact write performance. In Cassandra databases, this manifests as write-amplification issues. When a mutation on an indexed column occurs, an indexing operation triggers reindexing data in a separate index file.

More indexes on a CQL table can dramatically increase disk activity during write operations. If a single Cassandra node gets too many writes, there is saturated disk activity. This activity destabilizes individual nodes, creating Cassandra cluster problems.

For this reason, 2i should be used sparingly. Index size is fairly linear, but it can be difficult to plan for the amount of disk space needed in an active cluster for storing and re-indexing indexes.

Don’t use a 2i index in the following situations:

For more information about creating and using 2i, see Create and use secondary indexes (2i).

SSTable-attached indexing (SASI)

SASI is not recommended. It is only available for backward compatibility in certain database platforms and versions.

SASI uses indexes for non-partition columns, and creates an index file for each SSTable that store the rows of data. SASI is the most appropriate indexing method for LIKE full text searches. For more information, see develop:indexing/sasi/sasi-create.adoc.

SSTable-attached Secondary Indexing (SASI) wasn’t designed as a general indexing method. Instead it was created to address two query types:

  • Finding rows based on partial data matching, such as wildcard or LIKE queries.

  • Range queries on sparse data, specifically timestamps, such as queries for all records within a given time range.

However, it suffers from performance issues due to storage requirements for these indexes. In addition, SASI is marked as experimental and hasn’t received feature improvements given the existence of other generalized indexing methods.

SASI indexes are not supported in DataStax Enterprise 6.9 or Astra DB.

DSE Search indexing

DSE Search is only available in Astra DB Classic, DataStax Enterprise 6.8, and DataStax Enterprise 6.9.

DSE Search isn’t required, but it can be useful for certain full-text search applications. If you are using DSE Search, use DSE Search indexing for your full text search queries.

The DSE Search indexing method is specific to Apache Solr-Lucene searches because DSE Search supports simple keyword searches as well as complex queries on multiple fields with faceted search results. This indexing method features tokenized text search for use with analyzers. For more information, see Search index commands.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com