Create SASI index

Create indexes on one or more columns after defining a table. Secondary indexes created with SASI can be used to query a table using any column other than the table’s partition key.

Do not create a secondary index (2i) or a storage-attached index (SAI) on a table column, if you are creating a SASI index. CQL will be harder to tune and understanding performance will be more difficult if you create both types of indexes on the same table. If you create a secondary index (2i), SSTable-attached index (SASI), or a storage-attached index (SAI) on a table, do not add a DSE Search index to the same table column. See the difference between these index types in the overview.

To create an SASI index, define the table name and column name for the column to be indexed. Index name is optional. Index names are unique per keyspace. The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.

Examples

Let’s start with some simple SASI examples.

Primary key columns

Make an index on primary key columns, either composite partition key columns or clustering columns.

Single primary key column

This index creation will fail, because the table has a single PRIMARY KEY column. Such indexes are not allowed.

  1. Create a table called `cycling.cyclist_base with a primary key if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
Create a SASI index on the id column.

+

  CREATE CUSTOM INDEX IF NOT EXISTS id_sasi_idx 
    ON cycling.cyclist_base (cid)
    USING 'org.apache.cassandra.index.sasi.SASIIndex';

+ .Results

Details
ConfigurationException: partition key columns are not yet supported by SASI

This index creation will fail, because the id column is the primary key, as well as the single partition key. Such indexes are not allowed.

Composite partition key columns in primary key

A composite partition key uses two or more columns to define the partition key in the PRIMARY KEY definition. This index creation will fail, because SASI indexes is not allowed on PRIMARY KEY partition key columns.

  1. Create a table called cycling.rank_by_year_and_name with a composite partition key and a clustering column if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
      race_year int,
      race_name text,
      cyclist_name text,
      rank int,
      PRIMARY KEY ((race_year, race_name), rank)
    );
  2. Create a SASI index on the race_year and race_name columns These two columns are the composite partition key in the PRIMARY KEY definition.

    CREATE CUSTOM INDEX race_year_sasi_idx ON cycling.rank_by_year_and_name (race_year) 
      USING 'org.apache.cassandra.index.sasi.SASIIndex';
    CREATE CUSTOM INDEX race_name_sasi_idx ON cycling.rank_by_year_and_name (race_name) 
      USING 'org.apache.cassandra.index.sasi.SASIIndex';
    Results
    ConfigurationException: partition key columns are not yet supported by SASI

    You can see that partition key SASI indexes are not allowed.

Let’s try with the last part of a PRIMARY KEY definition, the clustering column.

Clustering column in primary key

Text clustering column

A clustering column sorts the rows within a partition. Clustering columns are part of the PRIMARY KEY definition. To filter a query using just a clustering column, you must create a SASI index on the column that you wish to use for the query.

  1. Create a table called cycling.country_flag with a primary key that includes a clustering column if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.country_flag (
      country text,
      cyclist_name text,
      flag int STATIC,
      PRIMARY KEY (country, cyclist_name)
    );
  2. Create a SASI index on the cyclist_name TEXT clustering column.

      CREATE CUSTOM INDEX cyclist_name_sasi_idx ON cycling.country_flag (cyclist_name) 
        USING 'org.apache.cassandra.index.sasi.SASIIndex';
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.cyclist_name_sasi_idx;
    Results
    CREATE CUSTOM INDEX cyclist_name_sasi_idx ON cycling.country_flag (cyclist_name) USING 'org.apache.cassandra.index.sasi.SASIIndex';

Now that we have created the table and index, let’s try some queries.

  • The first example tab shows a query with only the clustering column.

  • The second example tab displays a failing query due to case sensitivity.

  • The third example tab shows a query with the clustering column and the partition key column.

  • SELECT 1 - cyclist_name

  • SELECT 2 - ERROR

  • SELECT 3 - AND

Query the table using only the cyclist_name column. The SASI index was created with the default mode of PREFIX, so the query can use the LIKE operator.

  SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'Jac%';
Results
 country | cyclist_name | flag
---------+--------------+------
 Belgium |      Jacques |    1

(1 rows)

Warnings :
SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.

Query the table using the cyclist_name column, but using a matching pattern of 'jac%' instead of 'Jac%'.

  SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'jac%';
Results
 country | cyclist_name | flag
---------+--------------+------

(0 rows)

Warnings :
SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.

Query the table using the cyclist_name column and the country partition key column.

  SELECT * FROM cycling.country_flag WHERE cyclist_name 
    LIKE 'Jac%' AND country = 'USA';
Results
 country | cyclist_name | flag
---------+--------------+------

(0 rows)

Warnings :
SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.

Non-primary key columns

Indexes can be created on non-primary key columns of almost any type.

Text column

TEXT column SASI indexes can be created with either the default NontokenizingAnalyzer or set to StandardAnalyzer and DelimiterAnalyzer.

The NontokenizingAnalyzer is the default, and is used for non-tokenized searches. Using the NontokenizingAnalyzer, SASI indexes created on TEXT columns have two modes: PREFIX and CONTAINS.

  • The PREFIX mode is used for queries that use the LIKE operator with a prefix pattern, LIKE 'prefix%' or an equality operator, = 'value'. This mode is the default if no mode is specified.

  • The CONTAINS mode is used for queries that use the LIKE operator with a pattern that contains a suffix, LIKE '%suffix' or a substring, LIKE '%substring%' as well as the prefix and equality operators.

Indexes using the CONTAINS mode have a 4-6X larger stored index file size, so although prefixes and equalities can be created, there is a disadvantage.

Three options exist if the analyzer is set to NontokenizingAnalyzer (the default):

  • The case_sensitive option is set to true by default, but can be set to false to allow case-insensitive searches.

  • The normalize_lowercase option is set to false by default, but can be set to false.

  • The normalize_uppercase option is set to false by default, but can be set to false.

The StandardAnalyzer is used for tokenized searches. This analyzer is used for tokenized searches that permit stemming, and is also used for non-English languages, such as Chinese, Japanese, and Korean. The analyzer must be set to True to use the StandardAnalyzer.

Five options exist if the analyzer is set to StandardAnalyzer:

  • The tokenization_locale option is set to en by default, but can be set to other languages.

  • The tokenization_enable_stemming option is set to false by default, but can be set to true.

  • The tokenization_skip_stop_words option is set to false by default, but can be set to true.

  • The tokenization_normalize_lowercase option is set to false by default, but can be set to false.

  • The tokenization_normalize_uppercase option is set to false by default, but can be set to false.

Finally, the DelimiterAnalyzer is used for tokenized searches that store a column as a delimited list of values.

There is one additional option if the analyzer is set to DelimiterAnalyzer besides the options for StandardAnalyzer:

  • The delimiter option is set to ',' by default, but can be set to other delimiters.

Several examples using many of the modes and analyzers are illustrated here.

NontokenizingAnalyzer

PREFIX equality, case_sensitive: true

Equalities can be used with the PREFIX mode, and the case_sensitive option is set to true by default, in this example.

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the name column, a non-primary key TEXT column to use PREFIX and case sensitivity.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex';
  3. Verify the created index by checking the index schema.

      DESCRIBE INDEX cycling.name_sasi_idx;
    Results
    CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';

Now that we have created the table and index, query using a value for the column.

Query the table using the name column:

  SELECT * FROM cycling.cyclist_base WHERE name = 'Joakim BUKDAL';
Results
 cid                                  | age | birthday   | country | name
--------------------------------------+-----+------------+---------+---------------
 862cc51f-00a1-4d5a-976b-a359cab7300e |  20 | 1994-09-04 | Denmark | Joakim BUKDAL

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

PREFIX with case_sensitive: true

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the name column, a non-primary key TEXT column to use PREFIX and case sensitivity.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex';
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.name_sasi_idx;
    Results
    CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';

Now that we have created the table and index, query using a value for the column. There are two examples, one with a matching pattern and one with a non-matching pattern that involves case sensitivity.

  • SELECT 1 - matching

  • SELECT 2 - non-matching

Query the table using the name column using a matching pattern of 'A%':

  SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';
Results
 cid                                  | age | birthday   | country       | name
--------------------------------------+-----+------------+---------------+--------------
 15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States | Adrien COSTA
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 | 1993-06-18 |   New Zealand |   Alex FRAME

(2 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the name column using a non-matching pattern of 'a%'.

  SELECT * FROM cycling.cyclist_base WHERE name LIKE 'a%';
Results
 cid | age | birthday | country | name
-----+-----+----------+---------+------

(0 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Silent error, returns no values

This query will return no results, as the pattern is case sensitive.

PREFIX with case_sensitive: false

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the name column, a non-primary key TEXT column to use PREFIX and no case sensitivity.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
          'case_sensitive': 'false'
        };
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.name_sasi_idx;
    Results
    CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};

Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.

  • SELECT 1 - matching

  • SELECT 2 - matching

Query the table using the name column using a matching pattern of 'A%'.

  SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';
Results
 cid                                  | age | birthday   | country       | name
--------------------------------------+-----+------------+---------------+--------------
 15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States | Adrien COSTA
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 | 1993-06-18 |   New Zealand |   Alex FRAME

(2 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the name column using a matching pattern of 'a%'.

  SELECT * FROM cycling.cyclist_base WHERE name LIKE 'a%';
Results
 cid                                  | age | birthday   | country       | name
--------------------------------------+-----+------------+---------------+--------------
 15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States | Adrien COSTA
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 | 1993-06-18 |   New Zealand |   Alex FRAME

(2 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Now that case sensitivity is not an issue, both patterns will match.

CONTAINS SUFFIX with case_sensitive: true

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and no case sensitivity.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS'
        };
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.name_sasi_idx;
    Results
    CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};

Now that we have created the table and index, query using a value for the column. There are two examples, one with a matching pattern and one with a non-matching pattern that will fail due to case sensitivity.

  • SELECT 1 - matching

  • SELECT 2 - non-matching

Query the table using the name column using a matching pattern of '%GO':

  SELECT * FROM cycling.cyclist_base WHERE name LIKE '%GO';
Results
 cid                                  | age | birthday   | country | name
--------------------------------------+-----+------------+---------+-----------------
 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  38 | 1977-07-08 |   Italy | Paolo TIRALONGO

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the name column using a non-matching pattern of '%go':

  SELECT * FROM cycling.cyclist_base WHERE name LIKE '%go';
Results
 cid | age | birthday | country | name
-----+-----+----------+---------+------

(0 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Again, the pattern will fail due to case sensitivity.

CONTAINS SUFFIX with case_sensitive: false

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and no case sensitivity.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS',
          'case_sensitive': 'false'
        };
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.name_sasi_idx;
    Results
    CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'case_sensitive': 'false'};

Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.

  • SELECT 1 - matching

  • SELECT 2 - matching

Query the table using the name column using a matching pattern of '%GO':

  SELECT * FROM cycling.cyclist_base WHERE name LIKE '%GO';
Results
 cid                                  | age | birthday   | country | name
--------------------------------------+-----+------------+---------+-----------------
 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  38 | 1977-07-08 |   Italy | Paolo TIRALONGO

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the name column using a matching pattern of '%go':

  SELECT * FROM cycling.cyclist_base WHERE name LIKE '%go';
Results
 cid | age | birthday | country | name
-----+-----+----------+---------+------

(0 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Now that case sensitivity is not an issue, both patterns will match.

CONTAINS SUBSTRING with case_sensitive: true

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and case sensitivity.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS'
        };
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.name_sasi_idx;
CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};

Now that we have created the table and index, query using a value for the column. There are two examples, each with a matching pattern that shows that no case sensitivity will match both patterns.

  • SELECT 1 - matching

  • SELECT 2 - non-matching

Query the table using the name column using a matching pattern of '%am WE%':

  SELECT * FROM cycling.cyclist_base WHERE name LIKE '%am WE%';
Results
 cid                                  | age | birthday   | country       | name
--------------------------------------+-----+------------+---------------+-------------
 18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 |  18 | 1997-03-29 |   Netherlands | Bram WELTEN
 95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b |  20 | 1993-04-09 | United States |  Zam WEATON

(2 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the name column using a non-matching pattern of '%am WE%':

  SELECT * FROM cycling.cyclist_base WHERE name LIKE '%am we%';
Results
 cid | age | birthday | country | name
-----+-----+----------+---------+------

(0 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Silent error, returns no values

This query will return no results, as the pattern is case sensitive.

StandardAnalyzer

Stemming verbs

  1. Create a table called cycling.cyclist_base if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Add a bio column, a non-primary key TEXT column to the table to demonstrate this type of index.

      ALTER TABLE cycling.cyclist_base ADD bio text;
  3. Create a SASI index on the bio column, a non-primary key TEXT column using the StandardAnalyzer.

      CREATE CUSTOM INDEX bio_text_analysis_idx ON cycling.cyclist_base (bio) 
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
          'tokenization_enable_stemming': 'true',
          'analyzed': 'true',
          'tokenization_normalize_lowercase': 'true',
          'tokenization_locale': 'en'
        };
  4. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.bio_text_analysis_sasi_idx;

Now that we have created the table and index, add some data and query using a value for the column.

  • SELECT 1 - stemming

  • SELECT 2 - stemming 2

Query the table using the bio column using a stemming match:

  SELECT name, country, age, bio FROM cycling.cyclist_base 
    WHERE bio LIKE 'winning';
Results
 name         | country       | age | bio
--------------+---------------+-----+-------------------------------------------------------------------------
 Adrien COSTA | United States |  18 |  Professional cyclist, who likes racing in Asia and Europe, wins a lot.
   Zam WEATON | United States |  20 | Semi-pro cyclist, works delivering packages at night and winning races.

(2 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the bio column using another stemming match:

  SELECT name, country, age, bio FROM cycling.cyclist_base 
    WHERE bio LIKE 'like';
Results
 name         | country       | age | bio
--------------+---------------+-----+------------------------------------------------------------------------
 Adrien COSTA | United States |  18 | Professional cyclist, who likes racing in Asia and Europe, wins a lot.

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Delimiter based Tokenization Analysis

The DelimiterAnalyzer is used for tokenized searches that store a column as a delimited list of values. This provides an alternative to indexing collections, as delimiter separated text can be indexed without the overhead of CONTAINS mode nor using PREFIX or SUFFIX queries.

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Alter the table cycling.cyclist_base with TEXT column called aliases to demonstrate this type of index.

      ALTER TABLE cycling.cyclist_base
        ADD aliases text;
  3. Create SASI index on the aliases`column, a non-primary key TEXT column to use the `DelimiterAnalyzer with appropriate options.

      CREATE CUSTOM INDEX delimiter_sasi_idx on cycling.cyclist_base (aliases) 
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer',
          'delimiter': ',',
          'mode': 'prefix',
          'analyzed': 'true'
        };
  4. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.delimiter_sasi_idx;
CREATE CUSTOM INDEX delimiter_sasi_idx ON cycling.cyclist_base (aliases) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 'delimiter': ',', 'mode': 'prefix', 'analyzed': 'true'};

Now that we have created the table and index, query using a value for the column. After updating the table with some data, use a matching pattern for aliases that checks for one of the aliases listed in the column.

Query the table using the aliases column using a matching pattern of Stevie which uses a SASI index to complete the query.

  SELECT * FROM cycling.cyclist_base WHERE aliases LIKE 'Stevie' ALLOW FILTERING;
Results
 cid                                  | age | aliases             | bio  | birthday   | country     | name
--------------------------------------+-----+---------------------+------+------------+-------------+-------------------
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Non-text columns

All other non-text columns use the default PREFIX or SPARSE mode.

The SPARSE mode has been designed primarily to index very unique values and allow efficient storage and efficient range query. This mode is used for situations where each indexed value has a maximum of five matching rows. If there are more than five matching rows, an error will be returned. These characteristics make the SPARSE mode suitable for indexing columns with unique values, such as timestamps, UUIDs, and other unique identifiers.

In either mode, the operator = can be used for equality queries, and the operators >, >=, <, and can be used for range queries. These are the only operations for non-text columns using SASI indexing.

INT column with PREFIX mode

  1. Create a table called cycling.cyclist_base with INT column called age if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the age column, a non-primary key INT column using the PREFIX mode.

      CREATE CUSTOM INDEX IF NOT EXISTS age_sasi_idx ON cycling.cyclist_base (age)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = { 
          'mode': 'PREFIX' 
        };
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.age_sasi_idx;
    Results
    CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};

Now that we have created the table and index, query using a value for the column.

  • The first example tab shows a query with the age column in an equality.

  • The second example tab displays a query with the age column in a range query.

  • The third example tab shows a failing query due trying to use two age ranges with an AND operator.

  • SELECT 1 - equality

  • SELECT 2 - inequality

  • SELECT 3 - AND inequality failure

Query the table using the age column using an equality:

  SELECT * FROM cycling.cyclist_base WHERE age = 28;
Results
 cid                                  | age | aliases             | bio  | birthday   | country     | name
--------------------------------------+-----+---------------------+------+------------+-------------+-------------------
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the age column using an inequality range:

  SELECT * FROM cycling.cyclist_base WHERE age >= 20;
Results
 cid                                  | age | aliases             | bio                                                                     | birthday   | country       | name
--------------------------------------+-----+---------------------+-------------------------------------------------------------------------+------------+---------------+-------------------
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 |                null |                                                                    null | 1993-06-18 |   New Zealand |        Alex FRAME
 c9c9c484-5e4a-4542-8203-8d047a01b8a8 |  27 |                null |                                                                    null | 1987-09-04 |        Brazil |   Cristian EGIDIO
 862cc51f-00a1-4d5a-976b-a359cab7300e |  20 |                null |                                                                    null | 1994-09-04 |       Denmark |     Joakim BUKDAL
 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  38 |                null |                                                                    null | 1977-07-08 |         Italy |   Paolo TIRALONGO
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | Steve,Steven,Stevie |                                                                    null | 1987-06-07 |   Netherlands | Steven KRUIKSWIJK
 95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b |  20 |                null | Semi-pro cyclist, works delivering packages at night and winning races. | 1993-04-09 | United States |        Zam WEATON

(6 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the age column using two inequality ranges with an AND operator:

  SELECT * FROM cycling.cyclist_base WHERE age >= 28 AND age < 20;
Results
Fails because of the disjoint ranges

Even adding ALLOW FILTERING will not allow this query to succeed.

DATE column with SPARSE mode

  1. Create a table called cycling.cyclist_base with DATE column called birthday if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the birthday column, a non-primary key DATE column using the SPARSE mode.

      CREATE CUSTOM INDEX IF NOT EXISTS birthday_sasi_idx 
        ON cycling.cyclist_base (birthday)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = { 
          'mode': 'SPARSE' 
        };
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.age_sasi_idx;
CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};

Now that we have created the table and index, query using a value for the column.

  • SELECT 1 - equality

  • SELECT 2 - inequality

Query the table using the birthday column using an equality:

  SELECT * FROM cycling.cyclist_base WHERE birthday = '1987-06-07';
Results
 cid                                  | age | aliases             | bio  | birthday   | country     | name
--------------------------------------+-----+---------------------+------+------------+-------------+-------------------
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Query the table using the birthday column using an inequality range:

  SELECT * FROM cycling.cyclist_base WHERE birthday >= '1997-03-29';
Results
 cid                                  | age | aliases | bio                                                                    | birthday   | country       | name
--------------------------------------+-----+---------+------------------------------------------------------------------------+------------+---------------+---------------
 15a116fc-b833-4da6-ab9a-4a7775752836 |  18 |    null | Professional cyclist, who likes racing in Asia and Europe, wins a lot. | 1997-08-19 | United States |  Adrien COSTA
 1c526849-d3a2-42a3-bcf9-7903c80b3d16 |  19 |    null |                                                                   null | 1998-12-23 |     Australia | Kanden GROVES
 18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 |  18 |    null |                                                                   null | 1997-03-29 |   Netherlands |   Bram WELTEN

(3 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Compound queries using two or more SASI indexes

SASI supports queries with multiple predicates. However, due to the nature of the default indexing implementation, CQL requires the user to specify ALLOW FILTERING. Normally, adding ALLOW FILTERING opens the user to the potential performance pitfalls of such a query. With SASI, while the requirement to include ALLOW FILTERING remains, to reduce modifications to the grammar, the performance pitfalls do not exist because filtering is not performed.

SASI AND non-indexed column

SASI also supports filtering on non-indexed columns like age. The expression can only narrow down an existing query using AND. To use a non-indexed column, the SASI index must be used first, and then the non-indexed column can be used to further filter the results.

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and case sensitivity.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS'
        };
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.name_sasi_idx;
    Results
    CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};

Now that you have created the table and index, query using a value for the column. Use a matching pattern along with a non-indexed value for the age column.

Query the table using the name column using a matching pattern of '%A' and age > 20:

  SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%' AND age > 20
    ALLOW FILTERING;
Results
 cid                                  | age | birthday   | country     | name
--------------------------------------+-----+------------+-------------+------------
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 | 1993-06-18 | New Zealand | Alex FRAME

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Two or more SASI columns

  1. Create a table called cycling.cyclist_base with TEXT column called name if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create SASI indexes on the name and birthday column, a non-primary key TEXT column to use CONTAINS and case sensitivity and a DATE column using defaults.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS'
        };
      CREATE CUSTOM INDEX IF NOT EXISTS birthday_sasi_idx 
        ON cycling.cyclist_base (birthday)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = { 
          'mode': 'SPARSE' 
        };
  3. Verify the created indexes by checking the index schema.

    DESCRIBE TABLE cycling.cyclist_base;
CREATE TABLE cycling.cyclist_base (
    cid uuid PRIMARY KEY,
    age int,
    birthday date,
    country text,
    name text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';


CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};


CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};

Now that we have created the table and index, query using a value for the column. Use a matching pattern for name along with range query for the birthday column.

Query the table using the name column using a matching pattern of '%A' and a range query on birthday, both of which use SASI indexes to complete the query.

  SELECT * FROM cycling.cyclist_base WHERE name LIKE '%A' AND birthday > '1987-09-04' 
    ALLOW FILTERING;
Results
 cid                                  | age | birthday   | country       | name
--------------------------------------+-----+------------+---------------+--------------
 15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States | Adrien COSTA

(1 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

See CREATE INDEX for more information about creating SASI indexes.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com