Using CQL to create a secondary index on a column after defining a table.
Using CQL, you can create an index on a column after defining a table. You can also
index a collection column. Secondary indexes
are used to query a table using a column that is not normally queryable.
Secondary indexes are tricky to use and can impact performance greatly. The index
table is stored on each node in a cluster, so a query involving a secondary index
can rapidly become a performance nightmare if multiple nodes are accessed. A general
rule of thumb is to index a column with low cardinality of few values. Before
creating an index, be aware of when and when not to create an index.
Procedure
-
The table rank_by_year_and_name can yield the rank of
cyclists for races.
cqlsh> CREATE TABLE cycling.rank_by_year_and_name (
race_year int,
race_name text,
cyclist_name text,
rank int,
PRIMARY KEY ((race_year, race_name), rank)
);
-
Both race_year and race_name must be
specified as these columns comprise the partition key.
cqlsh> SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015 AND race_name='Tour of Japan - Stage 4 - Minami > Shinshu';
-
A logical query to try is a listing of the rankings for a particular year.
Because the table has a composite partition key, this query will fail if only
the first column is used in the conditional operator.
cqlsh> SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015;
-
An index is created for the race year, and the query will succeed. An index
name is optional and must be unique within a keyspace. If you do not provide a
name, Cassandra will assign a name like
race_year_idx.
cqlsh> CREATE INDEX ryear ON cycling.rank_by_year_and_name (race_year);
SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015;
-
A clustering column can also be used to create an index. An index is created on
rank, and used in a query.
cqlsh> CREATE INDEX rrank ON cycling.rank_by_year_and_name (rank);
SELECT * FROM cycling.rank_by_year_and_name WHERE rank = 1;