Using a secondary index

Using CQL to create a secondary index on a column after defining a table.

Create indexes on a column after defining a table. Secondary indexes are used to query a table using a column that is not normally query-able.

Secondary indexes can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a performance problem if multiple nodes are accessed. A general rule is to index a column with low cardinality of few values. Before creating an index, be aware of when not to create an index.

Procedure

The table rank_by_year_and_name stores the rank of cyclists for races.

CREATE TABLE cycling.rank_by_year_and_name ( 
  race_year int, 
  race_name text, 
  cyclist_name text, 
  rank int, 
  PRIMARY KEY ((race_year, race_name), rank) 
);

Both race_year and race_name must be specified as these columns comprise the partition key.

SELECT *
FROM cycling.rank_by_year_and_name
WHERE race_year = 2015
  AND race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';

race_year | race_name                                  | rank | cyclist_name
----------+--------------------------------------------+------+-----------------
     2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 | Benjamin PRADES
     2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |     Adam PHELAN
     2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |    Thomas LEBAS

A logical query to try is a listing of the rankings for a particular year. Because the table has a composite partition key, this query will fail if only the first column is used in the conditional operator.
```
SELECT *
FROM cycling.rank_by_year_and_name
WHERE race_year = 2015;
```
```
InvalidRequest: code=2200 [Invalid query] message=
"Partition key parts: race_name must be restricted as other parts are"
```

An index is created for the race year, and the query will succeed. An index name is optional and must be unique within a keyspace. If you do not provide a name, DataStax Distribution of Apache Cassandra™ assigns a name like race_year_idx.

CREATE INDEX ryear ON
cycling.rank_by_year_and_name (race_year);

SELECT *
FROM cycling.rank_by_year_and_name
WHERE race_year = 2015;

race_year | race_name                                  | rank | cyclist_name
----------+--------------------------------------------+------+-----------------
     2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |   Ilnur ZAKARIN
     2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    2 | Carlos BETANCUR
     2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 | Benjamin PRADES
     2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |     Adam PHELAN
     2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |    Thomas LEBAS

A clustering column can also be used to create an index. An index is created on rank, and used in a query.

CREATE INDEX rrank ON
cycling.rank_by_year_and_name (rank);

SELECT *
FROM cycling.rank_by_year_and_name
WHERE rank = 1;

race_year | race_name                                  | rank | cyclist_name
----------+--------------------------------------------+------+-------------------
     2014 |                        4th Tour of Beijing |    1 | Phillippe GILBERT
     2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |     Ilnur ZAKARIN
     2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |   Benjamin PRADES