Using a secondary index

Use CQL to create a secondary index on a column after defining a table.

Create indexes on a column after defining a table. Secondary indexes are used to query a table using a column that cannot normally be queried. DataStax Enterprise supports indexing a collection column and indexing a static column.

Secondary indexes can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a performance problem if multiple nodes are accessed. A general rule is to index a column with low cardinality of few values. Before creating an index, be aware of when not to create an index.

Important: DSE Search can provide a more robust solution for indexing data. Learn about managing search indexes and how to configure DSE Search. Do not add a secondary index and a search index to the same table.

Procedure

The table rank_by_year_and_name stores the rank of cyclists for races.

CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
  race_year int,
  race_name text,
  cyclist_name text,
  rank int,
  PRIMARY KEY ((race_year, race_name), rank)
);

Both race_year and race_name must be specified in a query because those columns comprise the partition key.

SELECT *
FROM cycling.rank_by_year_and_name
WHERE race_year = 2014
  AND race_name  = 'Tour of Japan - Stage 4 - Minami > Shinshu';

 race_year | race_name                                  | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |      Benjamin PRADES

(3 rows)

A logical query to try is a listing of the rankings for a particular year. Because the table has a composite partition key with race_year and race_name, the following query fails because only the race_year column is specified in the WHERE clause.

SELECT *
FROM cycling.rank_by_year_and_name
WHERE race_year = 2014;

InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance.
If you want to execute this query despite the performance
unpredictability, use ALLOW FILTERING"

In the following example, an index is created for the race year and the query succeeds. An index name is optional and must be unique within a keyspace. If you do not provide an index name, a name like race_year_idx is automatically assigned. Typically, you should explicitly set the index name.

CREATE INDEX IF NOT EXISTS race_year_idx ON
cycling.rank_by_year_and_name (race_year);

SELECT *
FROM cycling.rank_by_year_and_name
WHERE race_year = 2014;

 race_year | race_name                                  | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
      2014 |                        4th Tour of Beijing |    1 |    Phillippe GILBERT
      2014 |                        4th Tour of Beijing |    2 |        Daniel MARTIN
      2014 |                        4th Tour of Beijing |    3 | Johan Esteban CHAVES
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |      Benjamin PRADES

(6 rows)

An index can also be added for a clustering column. In the following example, an index is created on the rank column, and rank is specified in the query's WHERE clause.

CREATE INDEX IF NOT EXISTS rank_idx
ON cycling.rank_by_year_and_name (rank);

SELECT *
FROM cycling.rank_by_year_and_name
WHERE rank = 1;

 race_year | race_name                                  | rank | cyclist_name
-----------+--------------------------------------------+------+-------------------
      2014 |                        4th Tour of Beijing |    1 | Phillippe GILBERT
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |     Daniel MARTIN
      2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |     Ilnur ZAKARIN
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |   Benjamin PRADES

(4 rows)