Using a secondary index
Use CQL to create a secondary index on a column after defining a table.
Create indexes on a column after defining a table. Secondary indexes are used to query a table using a column that cannot normally be queried. DataStax Enterprise supports indexing a collection column and indexing a static column.
Secondary indexes are problematic to use and can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a significant performance problem if multiple nodes are accessed.
A general rule is to index a column with low cardinality of few values. Before creating an index, be aware of when and when not to create an index.
Procedure
-
The table rank_by_year_and_name can yield the rank of
cyclists for races.
CREATE TABLE cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );
-
Both race_year and race_name must be
specified as these columns comprise the partition key.
SELECT * FROM cycling.rank_by_year_and_name WHERE race_year = 2015 AND race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';
race_year | race_name | rank | cyclist_name ----------+--------------------------------------------+------+----------------- 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Adam PHELAN 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Thomas LEBAS
-
A logical query to try is a listing of the rankings for a particular year.
Because the table has a composite partition key, this query fails if only the
first column is used in the conditional operator.
SELECT * FROM cycling.rank_by_year_and_name WHERE race_year = 2015;
InvalidRequest: code=2200 [Invalid query] message= "Partition key parts: race_name must be restricted as other parts are"
-
In this example, an index is created for the race year and the query succeeds.
An index name is optional and must be unique within a keyspace. If you do not
provide an index name, DSE assigns a name like
race_year_idx.
CREATE INDEX ryear ON cycling.rank_by_year_and_name (race_year); SELECT * FROM cycling.rank_by_year_and_name WHERE race_year = 2015;
race_year | race_name | rank | cyclist_name ----------+--------------------------------------------+------+----------------- 2015 | Giro d'Italia - Stage 11 - Forli > Imola | 1 | Ilnur ZAKARIN 2015 | Giro d'Italia - Stage 11 - Forli > Imola | 2 | Carlos BETANCUR 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Adam PHELAN 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Thomas LEBAS
-
A clustering column can also be used to create an index. An index is created on
rank, and used in a query.
CREATE INDEX rrank ON cycling.rank_by_year_and_name (rank); SELECT * FROM cycling.rank_by_year_and_name WHERE rank = 1;
race_year | race_name | rank | cyclist_name ----------+--------------------------------------------+------+------------------- 2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT 2015 | Giro d'Italia - Stage 11 - Forli > Imola | 1 | Ilnur ZAKARIN 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES