Using a secondary index

Using CQL to create a secondary index on a column after defining a table.

Using CQL, you can create an index on a column after defining a table. In Cassandra 2.1 and later, you can index a collection column. In Cassandra 3.4 and later, static columns can be indexed. Secondary indexes are used to query a table using a column that is not normally queryable.

Secondary indexes are tricky to use and can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a performance nightmare if multiple nodes are accessed. A general rule of thumb is to index a column with low cardinality of few values. Before creating an index, be aware of when and when not to create an index.

In Cassandra 3.4 and later, a new implementation of secondary indexes, SSTable Attached Secondary Indexes (SASI) have greatly improved the performance of secondary indexes and should be used, if possible.

Procedure

  • The table rank_by_year_and_name can yield the rank of cyclists for races.
    cqlsh> CREATE TABLE cycling.rank_by_year_and_name ( 
      race_year int, 
      race_name text, 
      cyclist_name text, 
      rank int, 
      PRIMARY KEY ((race_year, race_name), rank) 
    );
  • Both race_year and race_name must be specified as these columns comprise the partition key.
    cqlsh> SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015 AND race_name='Tour of Japan - Stage 4 - Minami > Shinshu';


  • A logical query to try is a listing of the rankings for a particular year. Because the table has a composite partition key, this query will fail if only the first column is used in the conditional operator.
    cqlsh> SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015;


  • An index is created for the race year, and the query will succeed. An index name is optional and must be unique within a keyspace. If you do not provide a name, Cassandra will assign a name like race_year_idx.
    cqlsh> CREATE INDEX ryear ON cycling.rank_by_year_and_name (race_year);
    SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015;


  • A clustering column can also be used to create an index. An index is created on rank, and used in a query.
    cqlsh> CREATE INDEX rrank ON cycling.rank_by_year_and_name (rank);
    SELECT * FROM cycling.rank_by_year_and_name WHERE rank = 1;