Secondary index (2i) concepts

>>>>>>> Stashed changes :description: Use secondary indexing (2i) to create secondary indexes on a table.

Secondary indexing (2i) is the original secondary indexing of CQL, but today, it suffers from poor performance and latency. If you are building new indexing, use SAI indexing instead.

2i adds column-level indexes to any CQL table column of any CQL data type, except for a counter column. However, the indexes are locally built on each CQL node in a cluster, so using 2i for queries results in poor performance.

A number of techniques exist for guarding against the undesirable scenario where data might be incorrectly retrieved during a query based on stale values in an index.

There are distinct conditions about when and when not to use a 2i index.

Problems using a high-cardinality column index

A high-cardinality column is a column that has many distinct values, such as a column that contains a unique ID for each row. If you create an index on a high-cardinality column, which has many distinct values, a query between the fields incurs many seeks for very few results. In the table with a billion cycling races, looking up races by cyclist winners (a value that is typically unique for each song) instead of by their team is likely to be very inefficient.

It would probably be more efficient to manually maintain the table as a form of an index instead of using the built-in index. For columns containing unique data, it is sometimes better for performance to use an index for convenience, as long as the query volume to the table having an indexed column is moderate and not under constant load.

Conversely, creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense. Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example. Indexing a multitude of indexed columns having foo = true and foo = false is not useful.

An index on a column with low cardinality can boost read performance because the index is significantly smaller than the column. An index for a high-cardinality column may reduce performance. If your application requires a search on a high-cardinality column, a materialized view is ideal.

Problems using an index on a frequently updated or deleted column

The database stores tombstones in the index until the tombstone limit reaches 100K cells. After exceeding the tombstone limit, the query that uses the indexed value will fail.

Problems using an index to look for a row in a large partition unless narrowly queried

A query on an indexed column in a large cluster typically requires collating responses from multiple data partitions. The query response slows down as more machines are added to the cluster. When looking for a row in a large partition, narrow the search to avoid query performance degradation.


Create secondary indexes

How to create secondary indexes.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com