Create and use secondary indexes (2i)
SAI is almost always a better choice than 2i. |
Secondary indexing (2i) is the original secondary indexing of CQL, but today, it suffers from poor performance and latency. If you are building new indexes, use SAI instead.
2i adds column-level indexes to any CQL table column of any CQL data type, except for a counter column. However, the indexes are locally built on each CQL node in a cluster, so using 2i for queries results in poor performance.
There are distinct conditions about when you should and shouldn’t use a 2i index.
Problems using a high-cardinality column index
A high-cardinality column is a column that has many distinct values, such as a column that contains a unique ID for each row. If you create an index on a high-cardinality column, which has many distinct values, a query between the fields incurs many seeks for very few results. In the table with a billion cycling races, looking up races by cyclist winners (a value that is typically unique for each song) instead of by their team is likely to be very inefficient.
It would probably be more efficient to manually maintain the table as a form of an index instead of using the built-in index. For columns containing unique data, it is sometimes better for performance to use an index for convenience, as long as the query volume to the table having an indexed column is moderate and not under constant load.
Conversely, creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense.
Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example.
Indexing a multitude of indexed columns having foo = true
and foo = false
is not useful.
An index on a column with low cardinality can boost read performance because the index is significantly smaller than the column. An index for a high-cardinality column may reduce performance. If your application requires a search on a high-cardinality column, a materialized view is ideal.
Problems using an index on a frequently updated or deleted column
The database stores tombstones in the index until the tombstone limit reaches 100K cells. After exceeding the tombstone limit, the query that uses the indexed value will fail.
Problems using an index to look for a row in a large partition unless narrowly queried
A query on an indexed column in a large cluster typically requires collating responses from multiple data partitions. The query response slows down as more machines are added to the cluster. When looking for a row in a large partition, narrow the search to avoid query performance degradation.
Create a 2i index
Secondary indexing allows you create one or more secondary indexes on the same database table. 2i indexes can be used to query a table using any column other than the table’s partition key. If a composite partition key is defined, a 2i index can be created on any of the partition key columns.
In a production environment, certain columns might not be good choices, depending on their cardinality.
If you decide to create an index, always create either a secondary index (2i) or a storage-attached index (SAI) on a table column, but not both. CQL will be harder to tune and understanding performance will be more difficult if you create both types of indexes on the same table. See the difference between these index types in the overview. |
To create a 2i index, define the table name and column name for the column to be indexed:
CREATE INDEX id_idx ON cycling.cyclist_alt_stats (id);
Index name is optional. However, the default name might not be descriptive enough. Index names are unique per keyspace because indexes are created at the keyspace level and not at the table level.
The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.
If you use IF NOT EXISTS
in a CREATE [CUSTOM] INDEX
command, the command fails silently if an index with the same name already exists in the keyspace.
If you want the command to return an error when an index with the same name already exists, don’t use IF NOT EXISTS
.
Query with a 2i index
2i secondary indexes are not a replacement for primary keys, but they can be used to improve query performance in some cases.
After you create a 2i secondary index, query your data with the 2i secondary index:
SELECT * FROM cycling.rank_by_year_and_name
WHERE race_year = 2014
AND race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';
Results
race_year | race_name | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES
(3 rows)
For more examples, see Secondary index (2i) examples.
Get a 2i index
Use DESCRIBE TABLE
to check if a 2i secondary index exists and view its definition:
DESCRIBE INDEX cycling.blist_keys_idx;
CREATE INDEX blist_keys_idx ON cycling.birthday_list (keys(blist));
Alter a 2i index
A secondary index (2i) cannot be altered. If you need to modify a 2i index, you will need to drop the current index, and then create a new index.
-
Drop index:
DROP INDEX IF EXISTS cycling.rank_idx;
-
Create new index:
CREATE INDEX IF NOT EXISTS rank_idx ON cycling.rank_by_year_and_name (rank);
Drop a 2i index
Secondary index (2i) can be dropped (deleted):
DROP INDEX IF EXISTS cycling.teams_idx;