Create SASI index
Create indexes on one or more columns after defining a table. Secondary indexes created with SASI can be used to query a table using any column other than the table’s partition key.
Do not create a secondary index (2i) or a storage-attached index (SAI) on a table column, if you are creating a SASI index. CQL will be harder to tune and understanding performance will be more difficult if you create both types of indexes on the same table. See the difference between these index types in the overview. |
To create an SASI index, define the table name and column name for the column to be indexed. Index name is optional. Index names are unique per keyspace. The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.
Examples
Let’s start with some simple SASI examples.
Primary key columns
Make an index on primary key columns, either composite partition key columns or clustering columns.
Single primary key column
This index creation will fail, because the table has a single PRIMARY KEY
column.
Such indexes are not allowed.
-
Create a table called `cycling.cyclist_base with a primary key if it doesn’t already exist.
CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
id
column.+
CREATE CUSTOM INDEX IF NOT EXISTS id_sasi_idx
ON cycling.cyclist_base (cid)
USING 'org.apache.cassandra.index.sasi.SASIIndex';
+ .Results
Details
ConfigurationException: partition key columns are not yet supported by SASI
This index creation will fail, because the id
column is the primary key, as well as the single partition key.
Such indexes are not allowed.
Composite partition key columns in primary key
A composite partition key uses two or more columns to define the partition key in the PRIMARY KEY
definition.
This index creation will fail, because SASI indexes is not allowed on PRIMARY KEY
partition key columns.
-
Create a table called
cycling.rank_by_year_and_name
with a composite partition key and a clustering column if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );
-
Create a SASI index on the
race_year
andrace_name
columns These two columns are the composite partition key in thePRIMARY KEY
definition.CREATE CUSTOM INDEX race_year_sasi_idx ON cycling.rank_by_year_and_name (race_year) USING 'org.apache.cassandra.index.sasi.SASIIndex'; CREATE CUSTOM INDEX race_name_sasi_idx ON cycling.rank_by_year_and_name (race_name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
Results
ConfigurationException: partition key columns are not yet supported by SASI
You can see that partition key SASI indexes are not allowed.
Let’s try with the last part of a PRIMARY KEY
definition, the clustering column.
Clustering column in primary key
Text clustering column
A clustering column sorts the rows within a partition. Clustering columns are part of the PRIMARY KEY
definition.
To filter a query using just a clustering column, you must create a SASI index on the column that you wish to use for the query.
-
Create a table called
cycling.country_flag
with a primary key that includes a clustering column if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.country_flag ( country text, cyclist_name text, flag int STATIC, PRIMARY KEY (country, cyclist_name) );
-
Create a SASI index on the
cyclist_name
TEXT clustering column.CREATE CUSTOM INDEX cyclist_name_sasi_idx ON cycling.country_flag (cyclist_name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.cyclist_name_sasi_idx;
Results
Now that we have created the table and index, let’s try some queries.
-
The first example tab shows a query with only the clustering column.
-
The second example tab displays a failing query due to case sensitivity.
-
The third example tab shows a query with the clustering column and the partition key column.
-
SELECT 1 - cyclist_name
-
SELECT 2 - ERROR
-
SELECT 3 - AND
Query the table using only the cyclist_name
column.
The SASI index was created with the default mode of PREFIX
, so the query can use the LIKE
operator.
SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'Jac%';
Results
Query the table using the cyclist_name
column, but using a matching pattern of 'jac%' instead of 'Jac%'.
SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'jac%';
Results
Query the table using the cyclist_name
column and the country
partition key column.
SELECT * FROM cycling.country_flag WHERE cyclist_name
LIKE 'Jac%' AND country = 'USA';
Results
Non-primary key columns
Indexes can be created on non-primary key columns of almost any type.
Text column
TEXT column SASI indexes can be created with either the default NontokenizingAnalyzer
or set to StandardAnalyzer
and DelimiterAnalyzer
.
The NontokenizingAnalyzer
is the default, and is used for non-tokenized searches.
Using the NontokenizingAnalyzer
, SASI indexes created on TEXT columns have two modes: PREFIX
and CONTAINS
.
-
The
PREFIX
mode is used for queries that use theLIKE
operator with a prefix pattern,LIKE 'prefix%'
or an equality operator,= 'value'
. This mode is the default if no mode is specified. -
The
CONTAINS
mode is used for queries that use theLIKE
operator with a pattern that contains a suffix,LIKE '%suffix'
or a substring,LIKE '%substring%'
as well as the prefix and equality operators.
Indexes using the CONTAINS
mode have a 4-6X larger stored index file size, so although prefixes and equalities can be created, there is a disadvantage.
Three options exist if the analyzer is set to NontokenizingAnalyzer
(the default):
-
The
case_sensitive
option is set totrue
by default, but can be set tofalse
to allow case-insensitive searches. -
The
normalize_lowercase
option is set tofalse
by default, but can be set tofalse
. -
The
normalize_uppercase
option is set tofalse
by default, but can be set tofalse
.
The StandardAnalyzer
is used for tokenized searches.
This analyzer is used for tokenized searches that permit stemming, and is also used for non-English languages, such as Chinese, Japanese, and Korean.
The analyzer
must be set to True
to use the StandardAnalyzer
.
Five options exist if the analyzer is set to StandardAnalyzer
:
-
The
tokenization_locale
option is set toen
by default, but can be set to other languages. -
The
tokenization_enable_stemming
option is set tofalse
by default, but can be set totrue
. -
The
tokenization_skip_stop_words
option is set tofalse
by default, but can be set totrue
. -
The
tokenization_normalize_lowercase
option is set tofalse
by default, but can be set tofalse
. -
The
tokenization_normalize_uppercase
option is set tofalse
by default, but can be set tofalse
.
Finally, the DelimiterAnalyzer
is used for tokenized searches that store a column as a delimited list of values.
There is one additional option if the analyzer is set to DelimiterAnalyzer
besides the options for StandardAnalyzer
:
-
The
delimiter
option is set to','
by default, but can be set to other delimiters.
Several examples using many of the modes and analyzers are illustrated here.
NontokenizingAnalyzer
PREFIX equality, case_sensitive: true
Equalities can be used with the PREFIX
mode, and the case_sensitive
option is set to true
by default, in this example.
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use PREFIX and case sensitivity.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
-
Verify the created index by checking the index schema.
DESCRIBE INDEX cycling.name_sasi_idx;
Results
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
Now that we have created the table and index, query using a value for the column.
Query the table using the name
column:
SELECT * FROM cycling.cyclist_base WHERE name = 'Joakim BUKDAL';
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------+---------------
862cc51f-00a1-4d5a-976b-a359cab7300e | 20 | 1994-09-04 | Denmark | Joakim BUKDAL
(1 rows)
PREFIX with case_sensitive: true
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use PREFIX and case sensitivity.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.name_sasi_idx;
Results
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
Now that we have created the table and index, query using a value for the column. There are two examples, one with a matching pattern and one with a non-matching pattern that involves case sensitivity.
-
SELECT 1 - matching
-
SELECT 2 - non-matching
Query the table using the name
column using a matching pattern of 'A%':
SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------------+--------------
15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | 1993-06-18 | New Zealand | Alex FRAME
(2 rows)
Query the table using the name
column using a non-matching pattern of 'a%'.
SELECT * FROM cycling.cyclist_base WHERE name LIKE 'a%';
Results
cid | age | birthday | country | name
-----+-----+----------+---------+------
(0 rows)
Silent error, returns no values
This query will return no results, as the pattern is case sensitive.
PREFIX with case_sensitive: false
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use PREFIX and no case sensitivity.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.name_sasi_idx;
Results
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.
-
SELECT 1 - matching
-
SELECT 2 - matching
Query the table using the name
column using a matching pattern of 'A%'.
SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------------+--------------
15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | 1993-06-18 | New Zealand | Alex FRAME
(2 rows)
Query the table using the name
column using a matching pattern of 'a%'.
SELECT * FROM cycling.cyclist_base WHERE name LIKE 'a%';
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------------+--------------
15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | 1993-06-18 | New Zealand | Alex FRAME
(2 rows)
Now that case sensitivity is not an issue, both patterns will match.
CONTAINS SUFFIX with case_sensitive: true
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and no case sensitivity.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.name_sasi_idx;
Results
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
Now that we have created the table and index, query using a value for the column. There are two examples, one with a matching pattern and one with a non-matching pattern that will fail due to case sensitivity.
-
SELECT 1 - matching
-
SELECT 2 - non-matching
Query the table using the name
column using a matching pattern of '%GO':
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%GO';
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------+-----------------
220844bf-4860-49d6-9a4b-6b5d3a79cbfb | 38 | 1977-07-08 | Italy | Paolo TIRALONGO
(1 rows)
Query the table using the name
column using a non-matching pattern of '%go':
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%go';
Results
cid | age | birthday | country | name
-----+-----+----------+---------+------
(0 rows)
Again, the pattern will fail due to case sensitivity.
CONTAINS SUFFIX with case_sensitive: false
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and no case sensitivity.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'case_sensitive': 'false' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.name_sasi_idx;
Results
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'case_sensitive': 'false', 'mode': 'CONTAINS'};
Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.
-
SELECT 1 - matching
-
SELECT 2 - matching
Query the table using the name
column using a matching pattern of '%GO':
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%GO';
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------+-----------------
220844bf-4860-49d6-9a4b-6b5d3a79cbfb | 38 | 1977-07-08 | Italy | Paolo TIRALONGO
(1 rows)
Query the table using the name
column using a matching pattern of '%go':
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%go';
Results
cid | age | birthday | country | name
-----+-----+----------+---------+------
(0 rows)
Now that case sensitivity is not an issue, both patterns will match.
CONTAINS SUBSTRING with case_sensitive: true
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and case sensitivity.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.name_sasi_idx;
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.
-
SELECT 1 - matching
-
SELECT 2 - non-matching
Query the table using the name
column using a matching pattern of '%am WE%':
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%am WE%';
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------------+-------------
18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 | 18 | 1997-03-29 | Netherlands | Bram WELTEN
95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b | 20 | 1993-04-09 | United States | Zam WEATON
(2 rows)
Query the table using the name
column using a non-matching pattern of '%am WE%':
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%am we%';
Results
cid | age | birthday | country | name
-----+-----+----------+---------+------
(0 rows)
Silent error, returns no values
This query will return no results, as the pattern is case sensitive.
StandardAnalyzer
Stemming verbs
-
Create a table called
cycling.cyclist_base
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Add a
bio
column, a non-primary key TEXT column to the table to demonstrate this type of index.ALTER TABLE cycling.cyclist_base ADD bio text;
-
Create a SASI index on the
bio
column, a non-primary key TEXT column using theStandardAnalyzer
.CREATE CUSTOM INDEX bio_text_analysis_idx ON cycling.cyclist_base (bio) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'tokenization_enable_stemming': 'true', 'analyzed': 'true', 'tokenization_normalize_lowercase': 'true', 'tokenization_locale': 'en' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.bio_text_analysis_sasi_idx;
Now that we have created the table and index, add some data and query using a value for the column.
-
SELECT 1 - stemming
-
SELECT 2 - stemming 2
Query the table using the bio
column using a stemming match:
SELECT name, country, age, bio FROM cycling.cyclist_base
WHERE bio LIKE 'winning';
Results
name | country | age | bio
--------------+---------------+-----+-------------------------------------------------------------------------
Adrien COSTA | United States | 18 | Professional cyclist, who likes racing in Asia and Europe, wins a lot.
Zam WEATON | United States | 20 | Semi-pro cyclist, works delivering packages at night and winning races.
(2 rows)
Query the table using the bio
column using another stemming match:
SELECT name, country, age, bio FROM cycling.cyclist_base
WHERE bio LIKE 'like';
Results
name | country | age | bio
--------------+---------------+-----+------------------------------------------------------------------------
Adrien COSTA | United States | 18 | Professional cyclist, who likes racing in Asia and Europe, wins a lot.
(1 rows)
Delimiter based Tokenization Analysis
The DelimiterAnalyzer
is used for tokenized searches that store a column as a delimited list of values.
This provides an alternative to indexing collections,
as delimiter separated text can be indexed without the overhead of CONTAINS
mode nor using PREFIX
or SUFFIX
queries.
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Alter the table
cycling.cyclist_base
with TEXT column calledaliases
to demonstrate this type of index.ALTER TABLE cycling.cyclist_base ADD aliases text;
-
Create SASI index on the
aliases`column, a non-primary key TEXT column to use the `DelimiterAnalyzer
with appropriate options.CREATE CUSTOM INDEX delimiter_sasi_idx on cycling.cyclist_base (aliases) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 'delimiter': ',', 'mode': 'prefix', 'analyzed': 'true' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.delimiter_sasi_idx;
CREATE CUSTOM INDEX delimiter_sasi_idx ON cycling.cyclist_base (aliases) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 'delimiter': ',', 'mode': 'prefix'};
Now that we have created the table and index, query using a value for the column.
After updating the table with some data, use a matching pattern for aliases
that checks for one of the aliases listed in the column.
Query the table using the aliases
column using a matching pattern of Stevie
which uses a SASI index to complete the query.
SELECT * FROM cycling.cyclist_base WHERE aliases LIKE 'Stevie' ALLOW FILTERING;
Results
cid | age | aliases | bio | birthday | country | name
--------------------------------------+-----+---------------------+------+------------+-------------+-------------------
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK
(1 rows)
Non-text columns
All other non-text columns use the default PREFIX
or SPARSE
mode.
The SPARSE
mode has been designed primarily to index very unique values and allow efficient storage and efficient range query.
This mode is used for situations where each indexed value has a maximum of five matching rows.
If there are more than five matching rows, an error will be returned.
These characteristics make the SPARSE
mode suitable for indexing columns with unique values, such as timestamps, UUIDs, and other unique identifiers.
In either mode, the operator =
can be used for equality queries, and the operators >
, >=
, <
, and ⇐
can be used for range queries.
These are the only operations for non-text columns using SASI indexing.
INT column with PREFIX
mode
-
Create a table called
cycling.cyclist_base
with INT column calledage
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
age
column, a non-primary key INT column using thePREFIX
mode.CREATE CUSTOM INDEX IF NOT EXISTS age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'PREFIX' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.age_sasi_idx;
Results
CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
Now that we have created the table and index, query using a value for the column.
-
The first example tab shows a query with the
age
column in an equality. -
The second example tab displays a query with the
age
column in a range query. -
The third example tab shows a failing query due trying to use two age ranges with an
AND
operator.
-
SELECT 1 - equality
-
SELECT 2 - inequality
-
SELECT 3 - AND inequality failure
Query the table using the age
column using an equality:
SELECT * FROM cycling.cyclist_base WHERE age = 28;
Results
cid | age | aliases | bio | birthday | country | name
--------------------------------------+-----+---------------------+------+------------+-------------+-------------------
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK
(1 rows)
Query the table using the age
column using an inequality range:
SELECT * FROM cycling.cyclist_base WHERE age >= 20;
Results
cid | age | aliases | bio | birthday | country | name
--------------------------------------+-----+---------------------+-------------------------------------------------------------------------+------------+---------------+-------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | null | null | 1993-06-18 | New Zealand | Alex FRAME
c9c9c484-5e4a-4542-8203-8d047a01b8a8 | 27 | null | null | 1987-09-04 | Brazil | Cristian EGIDIO
862cc51f-00a1-4d5a-976b-a359cab7300e | 20 | null | null | 1994-09-04 | Denmark | Joakim BUKDAL
220844bf-4860-49d6-9a4b-6b5d3a79cbfb | 38 | null | null | 1977-07-08 | Italy | Paolo TIRALONGO
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK
95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b | 20 | null | Semi-pro cyclist, works delivering packages at night and winning races. | 1993-04-09 | United States | Zam WEATON
(6 rows)
Query the table using the age
column using two inequality ranges with an AND
operator:
SELECT * FROM cycling.cyclist_base WHERE age >= 28 AND age < 20;
Results
Fails because of the disjoint ranges
Even adding ALLOW FILTERING
will not allow this query to succeed.
DATE column with SPARSE
mode
-
Create a table called
cycling.cyclist_base
with DATE column calledbirthday
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
birthday
column, a non-primary key DATE column using theSPARSE
mode.CREATE CUSTOM INDEX IF NOT EXISTS birthday_sasi_idx ON cycling.cyclist_base (birthday) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'SPARSE' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.age_sasi_idx;
CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
Now that we have created the table and index, query using a value for the column.
-
SELECT 1 - equality
-
SELECT 2 - inequality
Query the table using the birthday
column using an equality:
SELECT * FROM cycling.cyclist_base WHERE birthday = '1987-06-07';
Results
cid | age | aliases | bio | birthday | country | name
--------------------------------------+-----+---------------------+------+------------+-------------+-------------------
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK
(1 rows)
Query the table using the birthday
column using an inequality range:
SELECT * FROM cycling.cyclist_base WHERE birthday >= '1997-03-29';
Results
cid | age | aliases | bio | birthday | country | name
--------------------------------------+-----+---------+------------------------------------------------------------------------+------------+---------------+---------------
15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | null | Professional cyclist, who likes racing in Asia and Europe, wins a lot. | 1997-08-19 | United States | Adrien COSTA
1c526849-d3a2-42a3-bcf9-7903c80b3d16 | 19 | null | null | 1998-12-23 | Australia | Kanden GROVES
18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 | 18 | null | null | 1997-03-29 | Netherlands | Bram WELTEN
(3 rows)
Compound queries using two or more SASI indexes
SASI supports queries with multiple predicates.
However, due to the nature of the default indexing implementation, CQL requires the user to specify ALLOW FILTERING
.
Normally, adding ALLOW FILTERING
opens the user to the potential performance pitfalls of such a query.
With SASI, while the requirement to include ALLOW FILTERING
remains, to reduce modifications to the grammar, the performance pitfalls do not exist because filtering is not performed.
SASI AND non-indexed column
SASI also supports filtering on non-indexed columns like age
.
The expression can only narrow down an existing query using AND
.
To use a non-indexed column, the SASI index must be used first, and then the non-indexed column can be used to further filter the results.
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create a SASI index on the
name
column, a non-primary key TEXT column to use CONTAINS and case sensitivity.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' };
-
Verify the created indexes by checking the index schema.
DESCRIBE INDEX cycling.name_sasi_idx;
Results
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
Now that you have created the table and index, query using a value for the column.
Use a matching pattern along with a non-indexed value for the age
column.
Query the table using the name
column using a matching pattern of '%A' and age > 20:
SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%' AND age > 20
ALLOW FILTERING;
Results
Two or more SASI columns
-
Create a table called
cycling.cyclist_base
with TEXT column calledname
if it doesn’t already exist.CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );
-
Create SASI indexes on the
name
andbirthday
column, a non-primary key TEXT column to use CONTAINS and case sensitivity and a DATE column using defaults.CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' }; CREATE CUSTOM INDEX IF NOT EXISTS birthday_sasi_idx ON cycling.cyclist_base (birthday) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'SPARSE' };
-
Verify the created indexes by checking the index schema.
DESCRIBE TABLE cycling.cyclist_base;
CREATE TABLE cycling.cyclist_base (
cid uuid PRIMARY KEY,
age int,
birthday date,
country text,
name text
) WITH additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND memtable = {}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
Now that we have created the table and index, query using a value for the column.
Use a matching pattern for name
along with range query for the birthday
column.
Query the table using the name
column using a matching pattern of '%A' and a range query on birthday
, both of which use SASI indexes to complete the query.
SELECT * FROM cycling.cyclist_base WHERE name LIKE '%A' AND birthday > '1987-09-04'
ALLOW FILTERING;
Results
cid | age | birthday | country | name
--------------------------------------+-----+------------+---------------+--------------
15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA
(1 rows)
See CREATE INDEX for more information about creating SASI indexes.