Create a secondary index (2i)

Create indexes on one or more columns after defining a table. Secondary indexes created with 2i can be used to query a table using a column other than the table’s partition key.

In a production environment, certain columns might not be good choices, depending on their cardinality.

Do not add an storage-attached index (SAI) or a DSE Search index to the same table. See the difference between these index types in the overview.

Create simple 2i indexes on a table to see how indexing works. Start by creating a table, cycling.alt_stats, that yields statistics about cyclists:

CREATE TABLE IF NOT EXISTS cycling.cyclist_alt_stats (
  id UUID PRIMARY KEY,
  lastname text,
  birthday date,
  nationality text,
  weight float,
  w_units text,
  height float,
  first_race date,
  last_race date
);

Now create indexes on the columns birthday and nationality:

CREATE INDEX IF NOT EXISTS birthday_idx
ON cycling.cyclist_alt_stats (birthday);
CREATE INDEX IF NOT EXISTS nationality_idx
ON cycling.cyclist_alt_stats (nationality);

The following query attempts to retrieve the cyclists with a specified birthday and nationality. The query returns an error:

  • CQL

  • Result

SELECT *
FROM cycling.cyclist_alt_stats
WHERE birthday = '1982-01-29'
  AND nationality = 'Russia';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance.
If you want to execute this query despite the performance
unpredictability, use ALLOW FILTERING"

The indexes have been created on appropriate low-cardinality columns, but the previous query still fails. Why?

The answer lies with the partition key, which has not been defined. When you attempt a potentially expensive query, such as searching a range of rows, the database requires the ALLOW FILTERING directive. The error is not due to multiple indexes, but the lack of a partition key definition in the query.

  • CQL

  • Result

SELECT *
FROM cycling.cyclist_alt_stats
WHERE birthday = '1982-01-29'
  AND nationality = 'Russia'
ALLOW FILTERING;
 id                                   | birthday   | first_race | height | last_race  | lastname | nationality | w_units | weight
--------------------------------------+------------+------------+--------+------------+----------+-------------+---------+--------
 e0953617-07eb-4c82-8f91-3b2757981625 | 1982-01-29 | 1998-02-15 |   1.78 | 2017-04-16 |    BRUTT |      Russia |      kg |     68

(1 rows)

Thus, one of the difficulties of using 2is is illustrated. SAI is almost always a better option.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com