Using a secondary index

Use CQL to create a secondary index on a column after defining a table.

Create indexes on a column after defining a table. Secondary indexes are used to query a table using a column that cannot normally be queried. DataStax Enterprise supports indexing a collection column and indexing a static column.

Secondary indexes are problematic to use and can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a significant performance problem if multiple nodes are accessed.

A general rule is to index a column with low cardinality of few values. Before creating an index, be aware of when and when not to create an index.

Important: DSE Search can provide a more robust solution for indexing data. Learn about managing search indexes and how to configure DSE Search. Do not add a secondary index and a search index to the same table.

Procedure

  • The table rank_by_year_and_name can yield the rank of cyclists for races.
    CREATE TABLE cycling.rank_by_year_and_name ( 
      race_year int, 
      race_name text, 
      cyclist_name text, 
      rank int, 
      PRIMARY KEY ((race_year, race_name), rank) 
    );
  • Both race_year and race_name must be specified as these columns comprise the partition key.
    SELECT *
    FROM cycling.rank_by_year_and_name
    WHERE race_year = 2015
      AND race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';
    race_year | race_name                                  | rank | cyclist_name
    ----------+--------------------------------------------+------+-----------------
         2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 | Benjamin PRADES
         2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |     Adam PHELAN
         2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |    Thomas LEBAS
  • A logical query to try is a listing of the rankings for a particular year. Because the table has a composite partition key, this query fails if only the first column is used in the conditional operator.
    SELECT *
    FROM cycling.rank_by_year_and_name
    WHERE race_year = 2015;
    InvalidRequest: code=2200 [Invalid query] message=
      "Partition key parts: race_name must be restricted as other parts are"
  • In this example, an index is created for the race year and the query succeeds. An index name is optional and must be unique within a keyspace. If you do not provide an index name, DSE assigns a name like race_year_idx.
    CREATE INDEX ryear ON
    cycling.rank_by_year_and_name (race_year);
    
    SELECT *
    FROM cycling.rank_by_year_and_name
    WHERE race_year = 2015;
    race_year | race_name                                  | rank | cyclist_name
    ----------+--------------------------------------------+------+-----------------
         2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |   Ilnur ZAKARIN
         2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    2 | Carlos BETANCUR
         2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 | Benjamin PRADES
         2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |     Adam PHELAN
         2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |    Thomas LEBAS
  • A clustering column can also be used to create an index. An index is created on rank, and used in a query.
    CREATE INDEX rrank ON
    cycling.rank_by_year_and_name (rank);
    
    SELECT *
    FROM cycling.rank_by_year_and_name
    WHERE rank = 1;
    race_year | race_name                                  | rank | cyclist_name
    ----------+--------------------------------------------+------+-------------------
         2014 |                        4th Tour of Beijing |    1 | Phillippe GILBERT
         2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |     Ilnur ZAKARIN
         2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |   Benjamin PRADES