Create SASI index
Astra DB and DSE 6.9 do not support SSTable-Attached Secondary Indexing (SASI). SASI is an experimental feature in Apache Cassandra® and Hyper-Converged Database, and it is a beta feature in DataStax Enterprise 6.8. DataStax does not recommend using SASI in production. Instead, use Storage Attached Indexes (SAI) or other supported methods (if appropriate) for indexing in production databases. |
SSTable-Attached Indexing (SASI) is a highly-scalable, per-SSTable index for Cassandra and DataStax Enterprise databases. SSTable Attached Secondary Indexes (SASI) are implemented as a corresponding index file for each SSTable created on disk. SSTables are created during normal flush from memtable to disk, during compaction, and during streaming operations (node joining or being decommissioned). SASI enables full text search as well as faster multi-criteria search in CQL.
SASI can be less resource-intensive, using less memory, disk, and CPU than the built-in secondary index (2i) implementation.
Providing a different functionality than 2i or SAI, SASI allows queries that filter using partial or full text matches.
SASI enables querying with prefix, suffix, or substrings on strings, similar to the SQL implementation of LIKE = "foo%"
, LIKE = "%foo"
, or LIKE = "%foo%"
in SELECT
queries.
It also supports SPARSE
indexing to improve performance of querying large, dense number ranges such as time series data.
Enable SASI
SASI is disabled by default in databases that support SASI.
To enable SASI, set the sasi_indexes_enabled: true
in the cassandra.yaml
file.
Create and query with SASI indexes
After you create a table, you can create SASI indexes on clustering key columns and any non-primary key columns except collection (map
, list
, set
) columns.
Don’t create multiple index types on the same table.
To create an SASI index, use CREATE INDEX
.
At minimum, you must define the table name and column name to be indexed.
Index name is optional, but the default name might not be descriptive enough. Index names are unique per keyspace because indexes are created at the keyspace level and not at the table level.
The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.
If you use IF NOT EXISTS
in a CREATE [CUSTOM] INDEX
command, the command fails silently if an index with the same name already exists in the keyspace.
If you want the command to return an error when an index with the same name already exists, don’t use IF NOT EXISTS
.
Additional options are available, as described in the reference for CREATE INDEX
.
Queries on SASI indexes can use the following filter operators and conditions:
-
AND
logic -
Numeric range
-
Text equality
-
CONTAINS
logic forLIKE
queries -
PREFIX
logic forLIKE
queries -
Tokenized data
-
Row-aware query path
-
Case-sensitivity (optional)
-
Unicode normalization (optional)
For example, the following query uses a LIKE
operator with a matching pattern on a SASI indexed column:
SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------------+--------------
15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | 1993-06-18 | New Zealand | Alex FRAME
(2 rows)
Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Primary key columns
A SASI index can only be created on clustering columns in a primary key because SASI indexes aren’t allowed on single primary key columns or composite partition key columns. For more information about primary keys, see Table concepts.
To filter a query using only a clustering column, you must create a SASI index on the column that you want to use for the query. For example:
-
Create a table with a primary key that includes a clustering column:
CREATE TABLE IF NOT EXISTS cycling.country_flag ( country text, cyclist_name text, flag int STATIC, PRIMARY KEY (country, cyclist_name) );
-
Create a SASI index on the clustering column. This example creates an index on the
cyclist_name
clustering column, which is a text column.CREATE CUSTOM INDEX cyclist_name_sasi_idx ON cycling.country_flag (cyclist_name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
-
Verify the created index by checking the index schema:
DESCRIBE INDEX cycling.cyclist_name_sasi_idx;
Results
CREATE CUSTOM INDEX cyclist_name_sasi_idx ON cycling.country_flag (cyclist_name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
-
Use the index to query the table:
Query with only the clustering column: The SASI index was created with the default mode of
PREFIX
, so the query can use theLIKE
operator.SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'Jac%';
Results
country | cyclist_name | flag ---------+--------------+------ Belgium | Jacques | 1 (1 rows) Warnings : SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.
Query with the indexed clustering column and the partition key column:
SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'Jac%' AND country = 'USA';
Results
country | cyclist_name | flag ---------+--------------+------ (0 rows) Warnings : SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.
Case sensitivity can cause queries to fail:
SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'jac%';
Results
country | cyclist_name | flag ---------+--------------+------ (0 rows) Warnings : SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.
Text column analyzers
SASI indexes on non-primary key text
columns can be created with either NontokenizingAnalyzer
(default), StandardAnalyzer
, or DelimiterAnalyzer
.
-
NontokenizingAnalyzer (default)
-
StandardAnalyzer
-
DelimiterAnalyzer
The NontokenizingAnalyzer
is the default, and is used for non-tokenized searches.
To create a SASI index with NontokenizingAnalyzer
, add USING 'org.apache.cassandra.index.sasi.SASIIndex' and `WITH OPTIONS
to your CREATE INDEX
statement.
The options for NontokenizingAnalyzer
are as follows:
-
mode
: Must be eitherPREFIX
(default) orCONTAINS
.-
PREFIX
(default): For queries that use theLIKE
operator with a prefix pattern (LIKE 'prefix%'
) or an equality operator (= 'value'
). This mode is the default if no mode is specified. -
CONTAINS
: For queries that use theLIKE
operator with a pattern that contains a suffix (LIKE '%suffix'
) or a substring (LIKE '%substring%'
) in addition to the prefix and equality operators. The disadvantage toCONTAINS
mode is that the stored index file is four to six times larger thanPREFIX
mode.
-
-
case_sensitive
: Iftrue
(default), queries on the index are case sensitive. Iffalse
, queries on the index are case insensitive. -
normalize_lowercase
: Whether to normalize all characters as lowercase. The default isfalse
(not normalized). -
normalize_uppercase
: Whether to normalize all characters as uppercase. The default isfalse
(not normalized).
The following NontokenizingAnalyzer
examples use this table:
CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
cid UUID PRIMARY KEY,
name text,
age int,
birthday date,
country text
);
Example: NontokenizingAnalyzer in PREFIX mode with case sensitivity
This example demonstrates how case sensitivity affects queries on SASI indexes in PREFIX
mode.
Equality (=
) and LIKE
operators are used.
-
Create a SASI index on the
name
column withPREFIX
mode. Because the default setting forcase_sensitive
istrue
(enabled), you don’t need to specify it inWITH OPTIONS
.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
-
Verify the created index by checking the index schema:
DESCRIBE INDEX cycling.name_sasi_idx;
-
Query the table using a
LIKE name = 'string'
clause for exact (equality) matching:SELECT * FROM cycling.cyclist_base WHERE name = 'Joakim BUKDAL';
The response includes rows that contain the exact string in the
name
column. -
Query the table using
WHERE name LIKE 'prefix%'
clauses with an uppercase and lowercase pattern.Uppercase pattern
'A%'
:SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';
The query succeeds because the pattern matches the case of the stored values:
cid | age | birthday | country | name --------------------------------------+-----+------------+---------------+-------------- 15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | 1993-06-18 | New Zealand | Alex FRAME (2 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Lowercase pattern
'a%'
:SELECT * FROM cycling.cyclist_base WHERE name LIKE 'a%';
No results are found because the pattern does not match the case of the stored values:
cid | age | birthday | country | name -----+-----+----------+---------+------ (0 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production. Silent error, returns no values
Example: NontokenizingAnalyzer in PREFIX mode without case sensitivity
This example demonstrates the behavior of queries when case sensitivity is disabled.
-
Create a SASI index on the
name
column withPREFIX
mode and'case_sensitive': 'false'
:CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false' };
-
Verify the created index by checking the index schema:
DESCRIBE INDEX cycling.name_sasi_idx;
-
Query the table using
WHERE name LIKE
clauses with an uppercase and lowercase pattern.Uppercase pattern 'A%':
SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';
Lowercase pattern 'a%':
SELECT * FROM cycling.cyclist_base WHERE name LIKE 'a%';
Both queries return the same results because case sensitivity is disabled:
cid | age | birthday | country | name --------------------------------------+-----+------------+---------------+-------------- 15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | 1993-06-18 | New Zealand | Alex FRAME (2 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Example: NontokenizingAnalyzer in CONTAINS mode with case sensitivity
This example shows the behavior of suffix and substring matching with case sensitivity enabled.
Suffix and substring matching are only available in CONTAINS
mode.
-
Create a SASI index on the
name
column withCONTAINS
mode. Because the default setting forcase_sensitive
istrue
(enabled), you don’t need to specify it inWITH OPTIONS
.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' };
-
Verify the created index by checking the index schema:
DESCRIBE INDEX cycling.name_sasi_idx;
-
Query the table using
WHERE name LIKE %suffix
clauses with an uppercase and lowercase pattern.Uppercase pattern
'%GO'
:SELECT * FROM cycling.cyclist_base WHERE name LIKE '%GO';
The query succeeds because the pattern matches the case of the stored values:
cid | age | birthday | country | name --------------------------------------+-----+------------+---------+----------------- 220844bf-4860-49d6-9a4b-6b5d3a79cbfb | 38 | 1977-07-08 | Italy | Paolo TIRALONGO (1 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Lowercase pattern
'%go'
:SELECT * FROM cycling.cyclist_base WHERE name LIKE '%go';
No results are found because the pattern does not match the case of the stored values:
cid | age | birthday | country | name -----+-----+----------+---------+------ (0 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
-
Query the table using
WHERE name LIKE %substring%
clauses with an uppercase and lowercase pattern.Uppercase pattern
'%am WE%'
:SELECT * FROM cycling.cyclist_base WHERE name LIKE '%am WE%';
The query succeeds because the pattern matches the case of the stored values:
cid | age | birthday | country | name --------------------------------------+-----+------------+---------------+------------- 18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 | 18 | 1997-03-29 | Netherlands | Bram WELTEN 95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b | 20 | 1993-04-09 | United States | Zam WEATON (2 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Lowercase pattern
'%am WE%'
:SELECT * FROM cycling.cyclist_base WHERE name LIKE '%am we%';
No results are found because the pattern does not match the case of the stored values:
cid | age | birthday | country | name -----+-----+----------+---------+------ (0 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production. Silent error, returns no values
Example: NontokenizingAnalyzer in CONTAINS mode without case sensitivity
This example shows the behavior of suffix matching with case sensitivity disabled.
Suffix matching is only available in CONTAINS
mode.
-
Create a SASI index on the
name
column withCONTAINS
mode and'case_sensitive': 'false'
:CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'case_sensitive': 'false' };
-
Verify the created index by checking the index schema:
DESCRIBE INDEX cycling.name_sasi_idx;
-
Query the table using
WHERE name LIKE %suffix
clauses with an uppercase and lowercase pattern.Uppercase pattern '%GO':
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%GO';
Lowercase pattern '%go':
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%go';
Both queries return the same results because case sensitivity is disabled:
cid | age | birthday | country | name --------------------------------------+-----+------------+---------+----------------- 220844bf-4860-49d6-9a4b-6b5d3a79cbfb | 38 | 1977-07-08 | Italy | Paolo TIRALONGO (1 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Use StandardAnalyzer
for tokenized searches that permit stemming or to query languages that don’t use the Latin (Roman) alphabet, such as Chinese or Russian.
To use StandardAnalyzer
, add USING 'org.apache.cassandra.index.sasi.SASIIndex' and `WITH OPTIONS
to your CREATE INDEX
statement.
The options for StandardAnalyzer
are as follows:
-
analyzer_class
: Must beorg.apache.cassandra.index.sasi.analyzer.StandardAnalyzer
. -
analyzed
: Must betrue
to useStandardAnalyzer
. -
tokenization_locale
: Two-letter language code. The default isen
. -
tokenization_enable_stemming
: Whether to enable stemming, which reduces words to their root form (for example,running
,runs
, andrun
are all stemmed asrun
). The default isfalse
(no stemming). -
tokenization_skip_stop_words
: Whether to skip stop words, such as "the" and "an". The default isfalse
(do not skip). -
tokenization_normalize_lowercase
: Whether to normalize all characters as lowercase. The default isfalse
(not normalized). -
tokenization_normalize_uppercase
: Whether to normalize all characters as uppercase. The default isfalse
(not normalized).
The following example uses StandardAnalyzer
with stemming enabled:
-
Create a table called
cycling.cyclist_base
:CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Add a
bio
column as a non-primary key TEXT column:ALTER TABLE cycling.cyclist_base ADD bio text;
-
Create a SASI index on the
bio
column using theStandardAnalyzer
with stemming and lowercase normalization set totrue
:CREATE CUSTOM INDEX bio_text_analysis_idx ON cycling.cyclist_base (bio) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'tokenization_enable_stemming': 'true', 'analyzed': 'true', 'tokenization_normalize_lowercase': 'true', 'tokenization_locale': 'en' };
-
Verify the created indexes by checking the index schema:
DESCRIBE INDEX cycling.bio_text_analysis_sasi_idx;
-
Insert data, and then query the table with the SASI index.
Query the table using the
bio
column using a stemming match:SELECT name, country, age, bio FROM cycling.cyclist_base WHERE bio LIKE 'winning';
Results
name | country | age | bio --------------+---------------+-----+------------------------------------------------------------------------- Adrien COSTA | United States | 18 | Professional cyclist, who likes racing in Asia and Europe, wins a lot. Zam WEATON | United States | 20 | Semi-pro cyclist, works delivering packages at night and winning races. (2 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Query the table using the
bio
column using another stemming match:SELECT name, country, age, bio FROM cycling.cyclist_base WHERE bio LIKE 'like';
Results
name | country | age | bio --------------+---------------+-----+------------------------------------------------------------------------ Adrien COSTA | United States | 18 | Professional cyclist, who likes racing in Asia and Europe, wins a lot. (1 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Use DelimiterAnalyzer
for tokenized searches that store a column as a delimited list of values.
This can be used as an alternative to indexing collection types, and delimiter separated text can be indexed without the overhead of CONTAINS
mode nor using PREFIX
or SUFFIX
queries.
To use DelimiterAnalyzer
, add USING 'org.apache.cassandra.index.sasi.SASIIndex' and `WITH OPTIONS
to your CREATE INDEX
statement.
The options for DelimiterAnalyzer
are as follows:
-
analyzer_class
: Must beorg.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer
. -
analyzed
: Must betrue
to useDelimiterAnalyzer
. -
delimiter
: Character used to separate values in the column. The default is comma (','
). -
tokenization_locale
: Two-letter language code. The default isen
. -
tokenization_enable_stemming
: Whether to enable stemming. The default isfalse
(no stemming). -
tokenization_skip_stop_words
: Whether to skip stop words, such as "the" and "an". The default isfalse
(do not skip). -
tokenization_normalize_lowercase
: Whether to normalize all characters as lowercase. The default isfalse
(not normalized). -
tokenization_normalize_uppercase
: Whether to normalize all characters as uppercase. The default isfalse
(not normalized).
The following example creates and queries a SASI index using DelimiterAnalyzer
:
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
:CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Add a
text
column calledaliases
:ALTER TABLE cycling.cyclist_base ADD aliases text;
-
Create a SASI index on the
aliases
column with theDelimiterAnalyzer
and appropriate options:CREATE CUSTOM INDEX delimiter_sasi_idx on cycling.cyclist_base (aliases) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 'delimiter': ',', 'mode': 'prefix', 'analyzed': 'true' };
-
Verify the created indexes by checking the index schema:
DESCRIBE INDEX cycling.delimiter_sasi_idx;
Results
CREATE CUSTOM INDEX delimiter_sasi_idx ON cycling.cyclist_base (aliases) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 'delimiter': ',', 'mode': 'prefix', 'analyzed': 'true'};
-
Add data, and then query the table with the new index. For this example, use a matching pattern for
aliases
that checks for one of the aliases listed in the column, such asStevie
.SELECT * FROM cycling.cyclist_base WHERE aliases LIKE 'Stevie' ALLOW FILTERING;
Results
cid | age | aliases | bio | birthday | country | name --------------------------------------+-----+---------------------+------+------------+-------------+------------------- 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK (1 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Non-text columns (non-primary key)
SASI indexes created on any non-text, non-primary key columns use PREFIX
(default) or SPARSE
mode.
The SPARSE
mode has been designed primarily to index specialized, unique values (timestamps, UUIDs, and other identifiers) while allowing efficient storage and range queries.
This mode is used for situations where each indexed value has a maximum of five matching rows.
If there are more than five matching rows, an error is returned.
In either mode, the operator =
can be used for equality queries, and the operators >
, >=
, <
, and ⇐
can be used for range queries.
These are the only operations for non-text columns using SASI indexing.
Example: Integer column with PREFIX
mode
-
Create a table called
cycling.cyclist_base
with INT column calledage
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
age
column, a non-primary key INT column using thePREFIX
mode.CREATE CUSTOM INDEX IF NOT EXISTS age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'PREFIX' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.age_sasi_idx;
Results
CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
-
Query the table with the new index.
Equality query on the
age
column:SELECT * FROM cycling.cyclist_base WHERE age = 28;
Results
cid | age | aliases | bio | birthday | country | name --------------------------------------+-----+---------------------+------+------------+-------------+------------------- 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK (1 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Range query on the
age
column:SELECT * FROM cycling.cyclist_base WHERE age >= 20;
Results
cid | age | aliases | bio | birthday | country | name --------------------------------------+-----+---------------------+-------------------------------------------------------------------------+------------+---------------+------------------- e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | null | null | 1993-06-18 | New Zealand | Alex FRAME c9c9c484-5e4a-4542-8203-8d047a01b8a8 | 27 | null | null | 1987-09-04 | Brazil | Cristian EGIDIO 862cc51f-00a1-4d5a-976b-a359cab7300e | 20 | null | null | 1994-09-04 | Denmark | Joakim BUKDAL 220844bf-4860-49d6-9a4b-6b5d3a79cbfb | 38 | null | null | 1977-07-08 | Italy | Paolo TIRALONGO 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK 95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b | 20 | null | Semi-pro cyclist, works delivering packages at night and winning races. | 1993-04-09 | United States | Zam WEATON (6 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
This query fails because it has two age ranges with an
AND
operator. This is impossible, even withALLOW FILTERING
.SELECT * FROM cycling.cyclist_base WHERE age >= 28 AND age < 20;
Results
Fails because of the disjoint ranges
Example: Date column with SPARSE
mode
-
Create a table called
cycling.cyclist_base
with adate
column calledbirthday
:CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
birthday
column, a non-primary keydate
column using theSPARSE
mode:CREATE CUSTOM INDEX IF NOT EXISTS birthday_sasi_idx ON cycling.cyclist_base (birthday) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'SPARSE' };
-
Verify the created indexes by checking the index schema:
DESCRIBE INDEX cycling.age_sasi_idx;
Results
CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
-
Query the table with the new index.
Equality query on the
birthday
column:SELECT * FROM cycling.cyclist_base WHERE birthday = '1987-06-07';
Results
cid | age | aliases | bio | birthday | country | name --------------------------------------+-----+---------------------+------+------------+-------------+------------------- 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK (1 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Range query on the
birthday
column:SELECT * FROM cycling.cyclist_base WHERE birthday >= '1997-03-29';
Results
cid | age | aliases | bio | birthday | country | name --------------------------------------+-----+---------+------------------------------------------------------------------------+------------+---------------+--------------- 15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | null | Professional cyclist, who likes racing in Asia and Europe, wins a lot. | 1997-08-19 | United States | Adrien COSTA 1c526849-d3a2-42a3-bcf9-7903c80b3d16 | 19 | null | null | 1998-12-23 | Australia | Kanden GROVES 18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 | 18 | null | null | 1997-03-29 | Netherlands | Bram WELTEN (3 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Compound queries using two or more SASI indexes
SASI supports queries with multiple predicates.
However, due to the nature of the default indexing implementation, CQL requires the user to specify ALLOW FILTERING
.
Normally, adding ALLOW FILTERING
opens the user to the potential performance pitfalls of such a query.
With SASI, the performance pitfalls don’t exist because the filtering is not performed; however, you must still include ALLOW FILTERING
to reduce modifications to the grammar.
Example: Query a SASI index and a non-indexed column
SASI supports filtering on non-indexed columns.
The expression can only narrow down an existing query using AND
.
To use a non-indexed column, the SASI index must be used first, and then the non-indexed column can be used to further filter the results.
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
:CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and case sensitivity:CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' };
-
Verify the created indexes by checking the index schema:
DESCRIBE INDEX cycling.name_sasi_idx;
ResultsCREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
-
Query the table with a matching pattern on the indexed column and a non-indexed value for the non-indexed column. This example uses the matching pattern of '%A' and
age > 20
:SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%' AND age > 20 ALLOW FILTERING;
Resultscid | age | birthday | country | name -----+-----+----------+---------+------ (0 rows)
Example: Query two or more SASI indexes
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
:CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create SASI indexes on the
name
column (a non-primary key TEXT column to use CONTAINS and case sensitivity) and thebirthday
column (a non-primary key DATE column using defaults):CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' }; CREATE CUSTOM INDEX IF NOT EXISTS birthday_sasi_idx ON cycling.cyclist_base (birthday) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'SPARSE' };
-
Verify the created indexes by checking the index schema:
DESCRIBE TABLE cycling.cyclist_base;
ResultsCREATE TABLE cycling.cyclist_base ( cid uuid PRIMARY KEY, age int, birthday date, country text, name text ) WITH additional_write_policy = '99PERCENTILE' AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND nodesync = {'enabled': 'true', 'incremental': 'true'} AND read_repair = 'BLOCKING' AND speculative_retry = '99PERCENTILE'; CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'}; CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
-
Query the table using the SASI indexes. This example uses a matching pattern for the
name
column ('%A') and a range query for thebirthday
column.SELECT * FROM cycling.cyclist_base WHERE name LIKE '%A' AND birthday > '1987-09-04' ALLOW FILTERING;
Resultscid | age | birthday | country | name --------------------------------------+-----+------------+---------------+-------------- 15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA (1 rows) Warnings : SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Get a SASI index
Verify that a SASI index exists:
DESCRIBE INDEX cycling.name_sasi_idx;
The result includes the CREATE INDEX
statement used to create the index:
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
Alter a SASI index
SASI indexes cannot be altered. If you need to modify an SASI index, you must drop the index, and then create a new index.
-
Drop index:
DROP INDEX IF EXISTS cycling.name_sasi_idx;
-
Create new index:
CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
Drop a SASI index
SASI indexes can be dropped (deleted):
DROP INDEX IF EXISTS cycling.name_sasi_idx;