Using a secondary index

Use CQL to create a secondary index on a column after defining a table.

Create indexes on a column after defining a table. Secondary indexes are used to query a table using a column that cannot normally be queried. DataStax Enterprise supports indexing a collection column and indexing a static column.

Secondary indexes can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a performance problem if multiple nodes are accessed. A general rule is to index a column with low cardinality of few values. Before creating an index, be aware of when not to create an index.

Important: DSE Search can provide a more robust solution for indexing data. Learn about managing search indexes and how to configure DSE Search. Do not add a secondary index and a search index to the same table.

Procedure

  • The table rank_by_year_and_name stores the rank of cyclists for races.
    CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
      race_year int,
      race_name text,
      cyclist_name text,
      rank int,
      PRIMARY KEY ((race_year, race_name), rank)
    );
  • Both race_year and race_name must be specified in a query because those columns comprise the partition key.
    SELECT *
    FROM cycling.rank_by_year_and_name
    WHERE race_year = 2014
      AND race_name  = 'Tour of Japan - Stage 4 - Minami > Shinshu';
     race_year | race_name                                  | rank | cyclist_name
    -----------+--------------------------------------------+------+----------------------
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |      Benjamin PRADES
    
    (3 rows)
  • A logical query to try is a listing of the rankings for a particular year. Because the table has a composite partition key with race_year and race_name, the following query fails because only the race_year column is specified in the WHERE clause.
    SELECT *
    FROM cycling.rank_by_year_and_name
    WHERE race_year = 2014;
    InvalidRequest: Error from server: code=2200 [Invalid query]
    message="Cannot execute this query as it might involve data
    filtering and thus may have unpredictable performance.
    If you want to execute this query despite the performance
    unpredictability, use ALLOW FILTERING"
  • In the following example, an index is created for the race year and the query succeeds. An index name is optional and must be unique within a keyspace. If you do not provide an index name, a name like race_year_idx is automatically assigned. Typically, you should explicitly set the index name.
    CREATE INDEX IF NOT EXISTS race_year_idx ON
    cycling.rank_by_year_and_name (race_year);
    SELECT *
    FROM cycling.rank_by_year_and_name
    WHERE race_year = 2014;
     race_year | race_name                                  | rank | cyclist_name
    -----------+--------------------------------------------+------+----------------------
          2014 |                        4th Tour of Beijing |    1 |    Phillippe GILBERT
          2014 |                        4th Tour of Beijing |    2 |        Daniel MARTIN
          2014 |                        4th Tour of Beijing |    3 | Johan Esteban CHAVES
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |      Benjamin PRADES
    
    (6 rows)
  • An index can also be added for a clustering column. In the following example, an index is created on the rank column, and rank is specified in the query's WHERE clause.
    CREATE INDEX IF NOT EXISTS rank_idx
    ON cycling.rank_by_year_and_name (rank);
    SELECT *
    FROM cycling.rank_by_year_and_name
    WHERE rank = 1;
     race_year | race_name                                  | rank | cyclist_name
    -----------+--------------------------------------------+------+-------------------
          2014 |                        4th Tour of Beijing |    1 | Phillippe GILBERT
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |     Daniel MARTIN
          2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |     Ilnur ZAKARIN
          2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |   Benjamin PRADES
    
    (4 rows)