Using a secondary index
Use CQL to create a secondary index on a column after defining a table.
Create indexes on a column after defining a table. Secondary indexes are used to query a table using a column that cannot normally be queried. DataStax Enterprise supports indexing a collection column and indexing a static column.
Secondary indexes can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a performance problem if multiple nodes are accessed. A general rule is to index a column with low cardinality of few values. Before creating an index, be aware of when not to create an index.
Procedure
-
The table
rank_by_year_and_name
stores the rank of cyclists for races.CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );
-
Both
race_year
andrace_name
must be specified in a query because those columns comprise the partition key.SELECT * FROM cycling.rank_by_year_and_name WHERE race_year = 2014 AND race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';
race_year | race_name | rank | cyclist_name -----------+--------------------------------------------+------+---------------------- 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES (3 rows)
-
A logical query to try is a listing of the rankings for a particular year.
Because the table has a composite partition key with
race_year
andrace_name
, the following query fails because only therace_year
column is specified in theWHERE
clause.SELECT * FROM cycling.rank_by_year_and_name WHERE race_year = 2014;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
-
In the following example, an index is created for the race year and the query
succeeds. An index name is optional and must be unique within a keyspace. If you
do not provide an index name, a name like
race_year_idx
is automatically assigned. Typically, you should explicitly set the index name.CREATE INDEX IF NOT EXISTS race_year_idx ON cycling.rank_by_year_and_name (race_year);
SELECT * FROM cycling.rank_by_year_and_name WHERE race_year = 2014;
race_year | race_name | rank | cyclist_name -----------+--------------------------------------------+------+---------------------- 2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT 2014 | 4th Tour of Beijing | 2 | Daniel MARTIN 2014 | 4th Tour of Beijing | 3 | Johan Esteban CHAVES 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES (6 rows)
-
An index can also be added for a clustering column. In the following example,
an index is created on the
rank
column, andrank
is specified in the query'sWHERE
clause.CREATE INDEX IF NOT EXISTS rank_idx ON cycling.rank_by_year_and_name (rank);
SELECT * FROM cycling.rank_by_year_and_name WHERE rank = 1;
race_year | race_name | rank | cyclist_name -----------+--------------------------------------------+------+------------------- 2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT 2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN 2015 | Giro d'Italia - Stage 11 - Forli > Imola | 1 | Ilnur ZAKARIN 2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES (4 rows)