Create SASI index
Create indexes on one or more columns after defining a table. Secondary indexes created with SASI can be used to query a table using any column other than the table’s partition key.
Do not create a secondary index (2i) or a storage-attached index (SAI) on a table column, if you are creating a SASI index. CQL will be harder to tune and understanding performance will be more difficult if you create both types of indexes on the same table. See the difference between these index types in the overview. |
To create an SASI index, define the table name and column name for the column to be indexed. Index name is optional. Index names are unique per keyspace. The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.
Examples
Let’s start with some simple SASI examples.
Primary key columns
Make an index on primary key columns, either composite partition key columns or clustering columns.
Single primary key column
This index creation will fail, because the table has a single PRIMARY KEY
column.
Such indexes are not allowed.
-
Create a table called `cycling.cyclist_base with a primary key if it doesn’t already exist.
CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
id
column.+
+ .Results
Details
This index creation will fail, because the id
column is the primary key, as well as the single partition key.
Such indexes are not allowed.
Composite partition key columns in primary key
A composite partition key uses two or more columns to define the partition key in the PRIMARY KEY
definition.
This index creation will fail, because SASI indexes is not allowed on PRIMARY KEY
partition key columns.
-
Create a table called
cycling.rank_by_year_and_name
with a composite partition key and a clustering column if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );
-
Create a SASI index on the
race_year
andrace_name
columns These two columns are the composite partition key in thePRIMARY KEY
definition.Results
ConfigurationException: partition key columns are not yet supported by SASI
You can see that partition key SASI indexes are not allowed.
Let’s try with the last part of a PRIMARY KEY
definition, the clustering column.
Clustering column in primary key
Text clustering column
A clustering column sorts the rows within a partition. Clustering columns are part of the PRIMARY KEY
definition.
To filter a query using just a clustering column, you must create a SASI index on the column that you wish to use for the query.
-
Create a table called
cycling.country_flag
with a primary key that includes a clustering column if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.country_flag ( country text, cyclist_name text, flag int STATIC, PRIMARY KEY (country, cyclist_name) );
-
Create a SASI index on the
cyclist_name
TEXT clustering column. -
Verify the created indexes by checking the index schema.
Results
Now that we have created the table and index, let’s try some queries.
-
The first example tab shows a query with only the clustering column.
-
The second example tab displays a failing query due to case sensitivity.
-
The third example tab shows a query with the clustering column and the partition key column.
-
SELECT 1 - cyclist_name
-
SELECT 2 - ERROR
-
SELECT 3 - AND
Query the table using only the cyclist_name
column.
The SASI index was created with the default mode of PREFIX
, so the query can use the LIKE
operator.
Results
Query the table using the cyclist_name
column, but using a matching pattern of 'jac%' instead of 'Jac%'.
Results
Query the table using the cyclist_name
column and the country
partition key column.
Results
Non-primary key columns
Indexes can be created on non-primary key columns of almost any type.
Text column
TEXT column SASI indexes can be created with either the default NontokenizingAnalyzer
or set to StandardAnalyzer
and DelimiterAnalyzer
.
The NontokenizingAnalyzer
is the default, and is used for non-tokenized searches.
Using the NontokenizingAnalyzer
, SASI indexes created on TEXT columns have two modes: PREFIX
and CONTAINS
.
-
The
PREFIX
mode is used for queries that use theLIKE
operator with a prefix pattern,LIKE 'prefix%'
or an equality operator,= 'value'
. This mode is the default if no mode is specified. -
The
CONTAINS
mode is used for queries that use theLIKE
operator with a pattern that contains a suffix,LIKE '%suffix'
or a substring,LIKE '%substring%'
as well as the prefix and equality operators.
Indexes using the CONTAINS
mode have a 4-6X larger stored index file size, so although prefixes and equalities can be created, there is a disadvantage.
Three options exist if the analyzer is set to NontokenizingAnalyzer
(the default):
-
The
case_sensitive
option is set totrue
by default, but can be set tofalse
to allow case-insensitive searches. -
The
normalize_lowercase
option is set tofalse
by default, but can be set tofalse
. -
The
normalize_uppercase
option is set tofalse
by default, but can be set tofalse
.
The StandardAnalyzer
is used for tokenized searches.
This analyzer is used for tokenized searches that permit stemming, and is also used for non-English languages, such as Chinese, Japanese, and Korean.
The analyzer
must be set to True
to use the StandardAnalyzer
.
Five options exist if the analyzer is set to StandardAnalyzer
:
-
The
tokenization_locale
option is set toen
by default, but can be set to other languages. -
The
tokenization_enable_stemming
option is set tofalse
by default, but can be set totrue
. -
The
tokenization_skip_stop_words
option is set tofalse
by default, but can be set totrue
. -
The
tokenization_normalize_lowercase
option is set tofalse
by default, but can be set tofalse
. -
The
tokenization_normalize_uppercase
option is set tofalse
by default, but can be set tofalse
.
Finally, the DelimiterAnalyzer
is used for tokenized searches that store a column as a delimited list of values.
There is one additional option if the analyzer is set to DelimiterAnalyzer
besides the options for StandardAnalyzer
:
-
The
delimiter
option is set to','
by default, but can be set to other delimiters.
Several examples using many of the modes and analyzers are illustrated here.
NontokenizingAnalyzer
PREFIX equality, case_sensitive: true
Equalities can be used with the PREFIX
mode, and the case_sensitive
option is set to true
by default, in this example.
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use PREFIX and case sensitivity. -
Verify the created index by checking the index schema.
Results
Now that we have created the table and index, query using a value for the column.
Query the table using the name
column:
Results
PREFIX with case_sensitive: true
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use PREFIX and case sensitivity. -
Verify the created indexes by checking the index schema.
Results
Now that we have created the table and index, query using a value for the column. There are two examples, one with a matching pattern and one with a non-matching pattern that involves case sensitivity.
-
SELECT 1 - matching
-
SELECT 2 - non-matching
Query the table using the name
column using a matching pattern of 'A%':
Results
Query the table using the name
column using a non-matching pattern of 'a%'.
Results
This query will return no results, as the pattern is case sensitive.
PREFIX with case_sensitive: false
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use PREFIX and no case sensitivity. -
Verify the created indexes by checking the index schema.
Results
Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.
-
SELECT 1 - matching
-
SELECT 2 - matching
Query the table using the name
column using a matching pattern of 'A%'.
Results
Query the table using the name
column using a matching pattern of 'a%'.
Results
Now that case sensitivity is not an issue, both patterns will match.
CONTAINS SUFFIX with case_sensitive: true
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and no case sensitivity. -
Verify the created indexes by checking the index schema.
Results
Now that we have created the table and index, query using a value for the column. There are two examples, one with a matching pattern and one with a non-matching pattern that will fail due to case sensitivity.
-
SELECT 1 - matching
-
SELECT 2 - non-matching
Query the table using the name
column using a matching pattern of '%GO':
Results
Query the table using the name
column using a non-matching pattern of '%go':
Results
Again, the pattern will fail due to case sensitivity.
CONTAINS SUFFIX with case_sensitive: false
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and no case sensitivity. -
Verify the created indexes by checking the index schema.
Results
Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.
-
SELECT 1 - matching
-
SELECT 2 - matching
Query the table using the name
column using a matching pattern of '%GO':
Results
Query the table using the name
column using a matching pattern of '%go':
Results
Now that case sensitivity is not an issue, both patterns will match.
CONTAINS SUBSTRING with case_sensitive: true
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist. -
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and case sensitivity. -
Verify the created indexes by checking the index schema.
Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.
-
SELECT 1 - matching
-
SELECT 2 - non-matching
Query the table using the name
column using a matching pattern of '%am WE%':
Results
Query the table using the name
column using a non-matching pattern of '%am WE%':
Results
This query will return no results, as the pattern is case sensitive.
StandardAnalyzer
Stemming verbs
-
Create a table called
cycling.cyclist_base
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Add a
bio
column, a non-primary key TEXT column to the table to demonstrate this type of index. -
Create a SASI index on the
bio
column, a non-primary key TEXT column using theStandardAnalyzer
. -
Verify the created indexes by checking the index schema.
Now that we have created the table and index, add some data and query using a value for the column.
-
SELECT 1 - stemming
-
SELECT 2 - stemming 2
Query the table using the bio
column using a stemming match:
Results
Query the table using the bio
column using another stemming match:
Results
Delimiter based Tokenization Analysis
The DelimiterAnalyzer
is used for tokenized searches that store a column as a delimited list of values.
This provides an alternative to indexing collections,
as delimiter separated text can be indexed without the overhead of CONTAINS
mode nor using PREFIX
or SUFFIX
queries.
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist. -
Alter the table
cycling.cyclist_base
with TEXT column calledaliases
to demonstrate this type of index. -
Create SASI index on the
aliases`column, a non-primary key TEXT column to use the `DelimiterAnalyzer
with appropriate options. -
Verify the created indexes by checking the index schema.
Now that we have created the table and index, query using a value for the column.
After updating the table with some data, use a matching pattern for aliases
that checks for one of the aliases listed in the column.
Query the table using the aliases
column using a matching pattern of Stevie
which uses a SASI index to complete the query.
Results
Non-text columns
All other non-text columns use the default PREFIX
or SPARSE
mode.
The SPARSE
mode has been designed primarily to index very unique values and allow efficient storage and efficient range query.
This mode is used for situations where each indexed value has a maximum of five matching rows.
If there are more than five matching rows, an error will be returned.
These characteristics make the SPARSE
mode suitable for indexing columns with unique values, such as timestamps, UUIDs, and other unique identifiers.
In either mode, the operator =
can be used for equality queries, and the operators >
, >=
, <
, and ⇐
can be used for range queries.
These are the only operations for non-text columns using SASI indexing.
INT column with PREFIX
mode
-
Create a table called
cycling.cyclist_base
with INT column calledage
if it doesn’t already exist. -
Create a SASI index on the
age
column, a non-primary key INT column using thePREFIX
mode. -
Verify the created indexes by checking the index schema.
Results
Now that we have created the table and index, query using a value for the column.
-
The first example tab shows a query with the
age
column in an equality. -
The second example tab displays a query with the
age
column in a range query. -
The third example tab shows a failing query due trying to use two age ranges with an
AND
operator.
-
SELECT 1 - equality
-
SELECT 2 - inequality
-
SELECT 3 - AND inequality failure
Query the table using the age
column using an equality:
Results
Query the table using the age
column using an inequality range:
Results
Query the table using the age
column using two inequality ranges with an AND
operator:
Results
Even adding ALLOW FILTERING
will not allow this query to succeed.
DATE column with SPARSE
mode
-
Create a table called
cycling.cyclist_base
with DATE column calledbirthday
if it doesn’t already exist. -
Create a SASI index on the
birthday
column, a non-primary key DATE column using theSPARSE
mode. -
Verify the created indexes by checking the index schema.
Now that we have created the table and index, query using a value for the column.
-
SELECT 1 - equality
-
SELECT 2 - inequality
Query the table using the birthday
column using an equality:
Results
Query the table using the birthday
column using an inequality range:
Results
Compound queries using two or more SASI indexes
SASI supports queries with multiple predicates.
However, due to the nature of the default indexing implementation, CQL requires the user to specify ALLOW FILTERING
.
Normally, adding ALLOW FILTERING
opens the user to the potential performance pitfalls of such a query.
With SASI, while the requirement to include ALLOW FILTERING
remains, to reduce modifications to the grammar, the performance pitfalls do not exist because filtering is not performed.
SASI AND non-indexed column
SASI also supports filtering on non-indexed columns like age
.
The expression can only narrow down an existing query using AND
.
To use a non-indexed column, the SASI index must be used first, and then the non-indexed column can be used to further filter the results.
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist. -
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and case sensitivity. -
Verify the created indexes by checking the index schema.
Results
Now that you have created the table and index, query using a value for the column.
Use a matching pattern along with a non-indexed value for the age
column.
Query the table using the name
column using a matching pattern of '%A' and age > 20:
Results
Two or more SASI columns
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist. -
Create SASI indexes on the
name
andbirthday
column, a non-primary key TEXT column to use CONTAINS and case sensitivity and a DATE column using defaults. -
Verify the created indexes by checking the index schema.
Now that we have created the table and index, query using a value for the column.
Use a matching pattern for name
along with range query for the birthday
column.
Query the table using the name
column using a matching pattern of '%A' and a range query on birthday
, both of which use SASI indexes to complete the query.
Results
See CREATE INDEX for more information about creating SASI indexes.