Create SASI index

Astra DB and DSE 6.9 do not support SSTable-Attached Secondary Indexing (SASI). SASI is an experimental feature in Apache Cassandra® and Hyper-Converged Database, and a beta feature in DataStax Enterprise 6.8.

DataStax does not recommend using SASI in production. Instead, use Storage Attached Indexes (SAI) for indexing in Cassandra 5.0 or later, DSE 6.8 or later, HCD, and Astra DB.

Create indexes on one or more columns after defining a table. Secondary indexes created with SASI can be used to query a table using any column other than the table’s partition key. Make sure that you do not create multiple index types on the same table.

To create an SASI index, define the table name and column name for the column to be indexed. Index name is optional. Index names are unique per keyspace. The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.

Prerequisites

Examples

Let’s start with some simple SASI examples.

Primary key columns

Make an index on primary key columns, either composite partition key columns or clustering columns.

Single primary key column

This index creation will fail, because the table has a single PRIMARY KEY column. Such indexes are not allowed.

Create a table called `cycling.cyclist_base with a primary key if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);

Create a SASI index on the id column.
Results

This index creation will fail, because the id column is the primary key, as well as the single partition key. Such indexes are not allowed.

Composite partition key columns in primary key

A composite partition key uses two or more columns to define the partition key in the PRIMARY KEY definition. This index creation will fail, because SASI indexes is not allowed on PRIMARY KEY partition key columns.

Create a table called cycling.rank_by_year_and_name with a composite partition key and a clustering column if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
  race_year int,
  race_name text,
  cyclist_name text,
  rank int,
  PRIMARY KEY ((race_year, race_name), rank)
);

Create a SASI index on the race_year and race_name columns These two columns are the composite partition key in the PRIMARY KEY definition.
Results
ConfigurationException: partition key columns are not yet supported by SASI
You can see that partition key SASI indexes are not allowed.

Let’s try with the last part of a PRIMARY KEY definition, the clustering column.

Clustering column in primary key

Text clustering column

A clustering column sorts the rows within a partition. Clustering columns are part of the PRIMARY KEY definition. To filter a query using just a clustering column, you must create a SASI index on the column that you wish to use for the query.

Create a table called cycling.country_flag with a primary key that includes a clustering column if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.country_flag (
  country text,
  cyclist_name text,
  flag int STATIC,
  PRIMARY KEY (country, cyclist_name)
);

Create a SASI index on the cyclist_name TEXT clustering column.
Verify the created indexes by checking the index schema.
Results

Now that we have created the table and index, let’s try some queries.

The first example tab shows a query with only the clustering column.
The second example tab displays a failing query due to case sensitivity.
The third example tab shows a query with the clustering column and the partition key column.

SELECT 1 - cyclist_name
SELECT 2 - ERROR
SELECT 3 - AND

Query the table using only the cyclist_name column. The SASI index was created with the default mode of PREFIX, so the query can use the LIKE operator.

Results

Query the table using the cyclist_name column, but using a matching pattern of 'jac%' instead of 'Jac%'.

Results

Query the table using the cyclist_name column and the country partition key column.

Results

Non-primary key columns

Indexes can be created on non-primary key columns of almost any type.

Text column

TEXT column SASI indexes can be created with either the default NontokenizingAnalyzer or set to StandardAnalyzer and DelimiterAnalyzer.

The NontokenizingAnalyzer is the default, and is used for non-tokenized searches. Using the NontokenizingAnalyzer, SASI indexes created on TEXT columns have two modes: PREFIX and CONTAINS.

The PREFIX mode is used for queries that use the LIKE operator with a prefix pattern, LIKE 'prefix%' or an equality operator, = 'value'. This mode is the default if no mode is specified.
The CONTAINS mode is used for queries that use the LIKE operator with a pattern that contains a suffix, LIKE '%suffix' or a substring, LIKE '%substring%' as well as the prefix and equality operators.

Indexes using the CONTAINS mode have a 4-6X larger stored index file size, so although prefixes and equalities can be created, there is a disadvantage.

Three options exist if the analyzer is set to NontokenizingAnalyzer (the default):

The case_sensitive option is set to true by default, but can be set to false to allow case-insensitive searches.
The normalize_lowercase option is set to false by default, but can be set to false.
The normalize_uppercase option is set to false by default, but can be set to false.

The StandardAnalyzer is used for tokenized searches. This analyzer is used for tokenized searches that permit stemming, and is also used for non-English languages, such as Chinese, Japanese, and Korean. The analyzer must be set to True to use the StandardAnalyzer.

Five options exist if the analyzer is set to StandardAnalyzer:

The tokenization_locale option is set to en by default, but can be set to other languages.
The tokenization_enable_stemming option is set to false by default, but can be set to true.
The tokenization_skip_stop_words option is set to false by default, but can be set to true.
The tokenization_normalize_lowercase option is set to false by default, but can be set to false.
The tokenization_normalize_uppercase option is set to false by default, but can be set to false.

Finally, the DelimiterAnalyzer is used for tokenized searches that store a column as a delimited list of values.

There is one additional option if the analyzer is set to DelimiterAnalyzer besides the options for StandardAnalyzer:

The delimiter option is set to ',' by default, but can be set to other delimiters.

Several examples using many of the modes and analyzers are illustrated here.

NontokenizingAnalyzer

PREFIX equality, case_sensitive: true

Equalities can be used with the PREFIX mode, and the case_sensitive option is set to true by default, in this example.

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);

Create a SASI index on the name column, a non-primary key TEXT column to use PREFIX and case sensitivity.
Verify the created index by checking the index schema.
Results

Now that we have created the table and index, query using a value for the column.

Query the table using the name column:

Results

PREFIX with case_sensitive: true

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);

Create a SASI index on the name column, a non-primary key TEXT column to use PREFIX and case sensitivity.
Verify the created indexes by checking the index schema.
Results

Now that we have created the table and index, query using a value for the column. There are two examples, one with a matching pattern and one with a non-matching pattern that involves case sensitivity.

SELECT 1 - matching
SELECT 2 - non-matching

Query the table using the name column using a matching pattern of 'A%':

Results

Query the table using the name column using a non-matching pattern of 'a%'.

Results

This query will return no results, as the pattern is case sensitive.

PREFIX with case_sensitive: false

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);

Create a SASI index on the name column, a non-primary key TEXT column to use PREFIX and no case sensitivity.
Verify the created indexes by checking the index schema.
Results

Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.

SELECT 1 - matching
SELECT 2 - matching

Query the table using the name column using a matching pattern of 'A%'.

Results

Query the table using the name column using a matching pattern of 'a%'.

Results

Now that case sensitivity is not an issue, both patterns will match.

CONTAINS SUFFIX with case_sensitive: true

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);

Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and no case sensitivity.
Verify the created indexes by checking the index schema.
Results

Now that we have created the table and index, query using a value for the column. There are two examples, one with a matching pattern and one with a non-matching pattern that will fail due to case sensitivity.

SELECT 1 - matching
SELECT 2 - non-matching

Query the table using the name column using a matching pattern of '%GO':

Results

Query the table using the name column using a non-matching pattern of '%go':

Results

Again, the pattern will fail due to case sensitivity.

CONTAINS SUFFIX with case_sensitive: false

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);

Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and no case sensitivity.
Verify the created indexes by checking the index schema.
Results

Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.

SELECT 1 - matching
SELECT 2 - matching

Query the table using the name column using a matching pattern of '%GO':

Results

Query the table using the name column using a matching pattern of '%go':

Results

Now that case sensitivity is not an issue, both patterns will match.

CONTAINS SUBSTRING with case_sensitive: true

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.
Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and case sensitivity.
Verify the created indexes by checking the index schema.

Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.

SELECT 1 - matching
SELECT 2 - non-matching

Query the table using the name column using a matching pattern of '%am WE%':

Results

Query the table using the name column using a non-matching pattern of '%am WE%':

Results

This query will return no results, as the pattern is case sensitive.

StandardAnalyzer

Stemming verbs

Create a table called cycling.cyclist_base if it doesn’t already exist.

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);

Add a bio column, a non-primary key TEXT column to the table to demonstrate this type of index.
Create a SASI index on the bio column, a non-primary key TEXT column using the StandardAnalyzer.
Verify the created indexes by checking the index schema.

Now that we have created the table and index, add some data and query using a value for the column.

SELECT 1 - stemming
SELECT 2 - stemming 2

Query the table using the bio column using a stemming match:

Results

Query the table using the bio column using another stemming match:

Results

Delimiter based Tokenization Analysis

The DelimiterAnalyzer is used for tokenized searches that store a column as a delimited list of values. This provides an alternative to indexing collections, as delimiter separated text can be indexed without the overhead of CONTAINS mode nor using PREFIX or SUFFIX queries.

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.
Alter the table cycling.cyclist_base with TEXT column called aliases to demonstrate this type of index.
Create SASI index on the aliases`column, a non-primary key TEXT column to use the `DelimiterAnalyzer with appropriate options.
Verify the created indexes by checking the index schema.

Now that we have created the table and index, query using a value for the column. After updating the table with some data, use a matching pattern for aliases that checks for one of the aliases listed in the column.

Query the table using the aliases column using a matching pattern of Stevie which uses a SASI index to complete the query.

Results

Non-text columns

All other non-text columns use the default PREFIX or SPARSE mode.

The SPARSE mode has been designed primarily to index very unique values and allow efficient storage and efficient range query. This mode is used for situations where each indexed value has a maximum of five matching rows. If there are more than five matching rows, an error will be returned. These characteristics make the SPARSE mode suitable for indexing columns with unique values, such as timestamps, UUIDs, and other unique identifiers.

In either mode, the operator = can be used for equality queries, and the operators >, >=, <, and ⇐ can be used for range queries. These are the only operations for non-text columns using SASI indexing.

INT column with `PREFIX` mode

Create a table called cycling.cyclist_base with INT column called age if it doesn’t already exist.
Create a SASI index on the age column, a non-primary key INT column using the PREFIX mode.
Verify the created indexes by checking the index schema.
Results

Now that we have created the table and index, query using a value for the column.

The first example tab shows a query with the age column in an equality.
The second example tab displays a query with the age column in a range query.
The third example tab shows a failing query due trying to use two age ranges with an AND operator.

SELECT 1 - equality
SELECT 2 - inequality
SELECT 3 - AND inequality failure

Query the table using the age column using an equality:

Results

Query the table using the age column using an inequality range:

Results

Query the table using the age column using two inequality ranges with an AND operator:

Results

Even adding ALLOW FILTERING will not allow this query to succeed.

DATE column with SPARSE mode

Create a table called cycling.cyclist_base with DATE column called birthday if it doesn’t already exist.
Create a SASI index on the birthday column, a non-primary key DATE column using the SPARSE mode.
Verify the created indexes by checking the index schema.

Now that we have created the table and index, query using a value for the column.

SELECT 1 - equality
SELECT 2 - inequality

Query the table using the birthday column using an equality:

Results

Query the table using the birthday column using an inequality range:

Results

Compound queries using two or more SASI indexes

SASI supports queries with multiple predicates. However, due to the nature of the default indexing implementation, CQL requires the user to specify ALLOW FILTERING. Normally, adding ALLOW FILTERING opens the user to the potential performance pitfalls of such a query. With SASI, while the requirement to include ALLOW FILTERING remains, to reduce modifications to the grammar, the performance pitfalls do not exist because filtering is not performed.

SASI AND non-indexed column

SASI also supports filtering on non-indexed columns like age. The expression can only narrow down an existing query using AND. To use a non-indexed column, the SASI index must be used first, and then the non-indexed column can be used to further filter the results.

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.
Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and case sensitivity.
Verify the created indexes by checking the index schema.
Results

Now that you have created the table and index, query using a value for the column. Use a matching pattern along with a non-indexed value for the age column.

Query the table using the name column using a matching pattern of '%A' and age > 20:

Results

Two or more SASI columns

Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.
Create SASI indexes on the name and birthday column, a non-primary key TEXT column to use CONTAINS and case sensitivity and a DATE column using defaults.
Verify the created indexes by checking the index schema.

Now that we have created the table and index, query using a value for the column. Use a matching pattern for name along with range query for the birthday column.

Query the table using the name column using a matching pattern of '%A' and a range query on birthday, both of which use SASI indexes to complete the query.

Results

See CREATE INDEX for more information about creating SASI indexes.

Create SASI index

Prerequisites

Examples

Primary key columns

Single primary key column

Composite partition key columns in primary key

Clustering column in primary key

Text clustering column

Non-primary key columns

Text column

NontokenizingAnalyzer

PREFIX equality, case_sensitive: true

PREFIX with case_sensitive: true

PREFIX with case_sensitive: false

CONTAINS SUFFIX with case_sensitive: true

CONTAINS SUFFIX with case_sensitive: false

CONTAINS SUBSTRING with case_sensitive: true

StandardAnalyzer

Stemming verbs

Delimiter based Tokenization Analysis

Non-text columns

INT column with `PREFIX` mode

Compound queries using two or more SASI indexes

SASI AND non-indexed column

Two or more SASI columns

Was this helpful?

Give Feedback