Create SASI index

Astra DB and DSE 6.9 do not support SSTable-Attached Secondary Indexing (SASI). SASI is an experimental feature in Apache Cassandra® and Hyper-Converged Database, and it is a beta feature in DataStax Enterprise 6.8.

DataStax does not recommend using SASI in production. Instead, use Storage Attached Indexes (SAI) or other supported methods (if appropriate) for indexing in production databases.

SSTable-Attached Indexing (SASI) is a highly-scalable, per-SSTable index for Cassandra and DataStax Enterprise databases. SSTable Attached Secondary Indexes (SASI) are implemented as a corresponding index file for each SSTable created on disk. SSTables are created during normal flush from memtable to disk, during compaction, and during streaming operations (node joining or being decommissioned). SASI enables full text search as well as faster multi-criteria search in CQL.

SASI can be less resource-intensive, using less memory, disk, and CPU than the built-in secondary index (2i) implementation. Providing a different functionality than 2i or SAI, SASI allows queries that filter using partial or full text matches. SASI enables querying with prefix, suffix, or substrings on strings, similar to the SQL implementation of LIKE = "foo%", LIKE = "%foo", or LIKE = "%foo%" in SELECT queries. It also supports SPARSE indexing to improve performance of querying large, dense number ranges such as time series data.

Enable SASI

SASI is disabled by default in databases that support SASI. To enable SASI, set the sasi_indexes_enabled: true in the cassandra.yaml file.

Create and query with SASI indexes

After you create a table, you can create SASI indexes on clustering key columns and any non-primary key columns except collection (map, list, set) columns. Don’t create multiple index types on the same table.

To create an SASI index, use CREATE INDEX. At minimum, you must define the table name and column name to be indexed.

Index name is optional, but the default name might not be descriptive enough. Index names are unique per keyspace because indexes are created at the keyspace level and not at the table level.

The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.

If you use IF NOT EXISTS in a CREATE [CUSTOM] INDEX command, the command fails silently if an index with the same name already exists in the keyspace. If you want the command to return an error when an index with the same name already exists, don’t use IF NOT EXISTS.

Additional options are available, as described in the reference for CREATE INDEX.

Queries on SASI indexes can use the following filter operators and conditions:

  • AND logic

  • Numeric range

  • Text equality

  • CONTAINS logic for LIKE queries

  • PREFIX logic for LIKE queries

  • Tokenized data

  • Row-aware query path

  • Case-sensitivity (optional)

  • Unicode normalization (optional)

For example, the following query uses a LIKE operator with a matching pattern on a SASI indexed column:

  SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';
Results
 cid                                  | age | birthday   | country       | name
--------------------------------------+-----+------------+---------------+--------------
 15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States | Adrien COSTA
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 | 1993-06-18 |   New Zealand |   Alex FRAME

(2 rows)

Warnings :
SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Primary key columns

A SASI index can only be created on clustering columns in a primary key because SASI indexes aren’t allowed on single primary key columns or composite partition key columns. For more information about primary keys, see Table concepts.

To filter a query using only a clustering column, you must create a SASI index on the column that you want to use for the query. For example:

  1. Create a table with a primary key that includes a clustering column:

    CREATE TABLE IF NOT EXISTS cycling.country_flag (
      country text,
      cyclist_name text,
      flag int STATIC,
      PRIMARY KEY (country, cyclist_name)
    );
  2. Create a SASI index on the clustering column. This example creates an index on the cyclist_name clustering column, which is a text column.

      CREATE CUSTOM INDEX cyclist_name_sasi_idx ON cycling.country_flag (cyclist_name) 
        USING 'org.apache.cassandra.index.sasi.SASIIndex';
  3. Verify the created index by checking the index schema:

      DESCRIBE INDEX cycling.cyclist_name_sasi_idx;
    Results
    CREATE CUSTOM INDEX cyclist_name_sasi_idx ON cycling.country_flag (cyclist_name) USING 'org.apache.cassandra.index.sasi.SASIIndex';
  4. Use the index to query the table:

    Query with only the clustering column: The SASI index was created with the default mode of PREFIX, so the query can use the LIKE operator.

      SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'Jac%';
    Results
     country | cyclist_name | flag
    ---------+--------------+------
     Belgium |      Jacques |    1
    
    (1 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.

    Query with the indexed clustering column and the partition key column:

      SELECT * FROM cycling.country_flag WHERE cyclist_name 
        LIKE 'Jac%' AND country = 'USA';
    Results
     country | cyclist_name | flag
    ---------+--------------+------
    
    (0 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.

    Case sensitivity can cause queries to fail:

      SELECT * FROM cycling.country_flag WHERE cyclist_name LIKE 'jac%';
    Results
     country | cyclist_name | flag
    ---------+--------------+------
    
    (0 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.country_flag'. SASI is still in beta, take extra caution when using it in production.

Text column analyzers

SASI indexes on non-primary key text columns can be created with either NontokenizingAnalyzer (default), StandardAnalyzer, or DelimiterAnalyzer.

  • NontokenizingAnalyzer (default)

  • StandardAnalyzer

  • DelimiterAnalyzer

The NontokenizingAnalyzer is the default, and is used for non-tokenized searches.

To create a SASI index with NontokenizingAnalyzer, add USING 'org.apache.cassandra.index.sasi.SASIIndex' and `WITH OPTIONS to your CREATE INDEX statement. The options for NontokenizingAnalyzer are as follows:

  • mode: Must be either PREFIX (default) or CONTAINS.

    • PREFIX (default): For queries that use the LIKE operator with a prefix pattern (LIKE 'prefix%') or an equality operator (= 'value'). This mode is the default if no mode is specified.

    • CONTAINS: For queries that use the LIKE operator with a pattern that contains a suffix (LIKE '%suffix') or a substring (LIKE '%substring%') in addition to the prefix and equality operators. The disadvantage to CONTAINS mode is that the stored index file is four to six times larger than PREFIX mode.

  • case_sensitive: If true (default), queries on the index are case sensitive. If false, queries on the index are case insensitive.

  • normalize_lowercase: Whether to normalize all characters as lowercase. The default is false (not normalized).

  • normalize_uppercase: Whether to normalize all characters as uppercase. The default is false (not normalized).

The following NontokenizingAnalyzer examples use this table:

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);
Example: NontokenizingAnalyzer in PREFIX mode with case sensitivity

This example demonstrates how case sensitivity affects queries on SASI indexes in PREFIX mode. Equality (=) and LIKE operators are used.

  1. Create a SASI index on the name column with PREFIX mode. Because the default setting for case_sensitive is true (enabled), you don’t need to specify it in WITH OPTIONS.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex';
  2. Verify the created index by checking the index schema:

      DESCRIBE INDEX cycling.name_sasi_idx;
  3. Query the table using a LIKE name = 'string' clause for exact (equality) matching:

      SELECT * FROM cycling.cyclist_base WHERE name = 'Joakim BUKDAL';

    The response includes rows that contain the exact string in the name column.

  4. Query the table using WHERE name LIKE 'prefix%' clauses with an uppercase and lowercase pattern.

    Uppercase pattern 'A%':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';

    The query succeeds because the pattern matches the case of the stored values:

     cid                                  | age | birthday   | country       | name
    --------------------------------------+-----+------------+---------------+--------------
     15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States | Adrien COSTA
     e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 | 1993-06-18 |   New Zealand |   Alex FRAME
    
    (2 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

    Lowercase pattern 'a%':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE 'a%';

    No results are found because the pattern does not match the case of the stored values:

     cid | age | birthday | country | name
    -----+-----+----------+---------+------
    
    (0 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
    
    Silent error, returns no values
Example: NontokenizingAnalyzer in PREFIX mode without case sensitivity

This example demonstrates the behavior of queries when case sensitivity is disabled.

  1. Create a SASI index on the name column with PREFIX mode and 'case_sensitive': 'false':

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
          'case_sensitive': 'false'
        };
  2. Verify the created index by checking the index schema:

      DESCRIBE INDEX cycling.name_sasi_idx;
  3. Query the table using WHERE name LIKE clauses with an uppercase and lowercase pattern.

    Uppercase pattern 'A%':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%';

    Lowercase pattern 'a%':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE 'a%';

    Both queries return the same results because case sensitivity is disabled:

     cid                                  | age | birthday   | country       | name
    --------------------------------------+-----+------------+---------------+--------------
     15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States | Adrien COSTA
     e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 | 1993-06-18 |   New Zealand |   Alex FRAME
    
    (2 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
Example: NontokenizingAnalyzer in CONTAINS mode with case sensitivity

This example shows the behavior of suffix and substring matching with case sensitivity enabled. Suffix and substring matching are only available in CONTAINS mode.

  1. Create a SASI index on the name column with CONTAINS mode. Because the default setting for case_sensitive is true (enabled), you don’t need to specify it in WITH OPTIONS.

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS'
        };
  2. Verify the created index by checking the index schema:

      DESCRIBE INDEX cycling.name_sasi_idx;
  3. Query the table using WHERE name LIKE %suffix clauses with an uppercase and lowercase pattern.

    Uppercase pattern '%GO':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE '%GO';

    The query succeeds because the pattern matches the case of the stored values:

     cid                                  | age | birthday   | country | name
    --------------------------------------+-----+------------+---------+-----------------
     220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  38 | 1977-07-08 |   Italy | Paolo TIRALONGO
    
    (1 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

    Lowercase pattern '%go':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE '%go';

    No results are found because the pattern does not match the case of the stored values:

     cid | age | birthday | country | name
    -----+-----+----------+---------+------
    
    (0 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
  4. Query the table using WHERE name LIKE %substring% clauses with an uppercase and lowercase pattern.

    Uppercase pattern '%am WE%':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE '%am WE%';

    The query succeeds because the pattern matches the case of the stored values:

     cid                                  | age | birthday   | country       | name
    --------------------------------------+-----+------------+---------------+-------------
     18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 |  18 | 1997-03-29 |   Netherlands | Bram WELTEN
     95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b |  20 | 1993-04-09 | United States |  Zam WEATON
    
    (2 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

    Lowercase pattern '%am WE%':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE '%am we%';

    No results are found because the pattern does not match the case of the stored values:

     cid | age | birthday | country | name
    -----+-----+----------+---------+------
    
    (0 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.
    
    Silent error, returns no values
Example: NontokenizingAnalyzer in CONTAINS mode without case sensitivity

This example shows the behavior of suffix matching with case sensitivity disabled. Suffix matching is only available in CONTAINS mode.

  1. Create a SASI index on the name column with CONTAINS mode and 'case_sensitive': 'false':

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS',
          'case_sensitive': 'false'
        };
  2. Verify the created index by checking the index schema:

      DESCRIBE INDEX cycling.name_sasi_idx;
  3. Query the table using WHERE name LIKE %suffix clauses with an uppercase and lowercase pattern.

    Uppercase pattern '%GO':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE '%GO';

    Lowercase pattern '%go':

      SELECT * FROM cycling.cyclist_base WHERE name LIKE '%go';

    Both queries return the same results because case sensitivity is disabled:

     cid                                  | age | birthday   | country | name
    --------------------------------------+-----+------------+---------+-----------------
     220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  38 | 1977-07-08 |   Italy | Paolo TIRALONGO
    
    (1 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Use StandardAnalyzer for tokenized searches that permit stemming or to query languages that don’t use the Latin (Roman) alphabet, such as Chinese or Russian.

To use StandardAnalyzer, add USING 'org.apache.cassandra.index.sasi.SASIIndex' and `WITH OPTIONS to your CREATE INDEX statement. The options for StandardAnalyzer are as follows:

  • analyzer_class: Must be org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer.

  • analyzed: Must be true to use StandardAnalyzer.

  • tokenization_locale: Two-letter language code. The default is en.

  • tokenization_enable_stemming: Whether to enable stemming, which reduces words to their root form (for example, running, runs, and run are all stemmed as run). The default is false (no stemming).

  • tokenization_skip_stop_words: Whether to skip stop words, such as "the" and "an". The default is false (do not skip).

  • tokenization_normalize_lowercase: Whether to normalize all characters as lowercase. The default is false (not normalized).

  • tokenization_normalize_uppercase: Whether to normalize all characters as uppercase. The default is false (not normalized).

The following example uses StandardAnalyzer with stemming enabled:

  1. Create a table called cycling.cyclist_base:

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Add a bio column as a non-primary key TEXT column:

      ALTER TABLE cycling.cyclist_base ADD bio text;
  3. Create a SASI index on the bio column using the StandardAnalyzer with stemming and lowercase normalization set to true:

      CREATE CUSTOM INDEX bio_text_analysis_idx ON cycling.cyclist_base (bio) 
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
          'tokenization_enable_stemming': 'true',
          'analyzed': 'true',
          'tokenization_normalize_lowercase': 'true',
          'tokenization_locale': 'en'
        };
  4. Verify the created indexes by checking the index schema:

      DESCRIBE INDEX cycling.bio_text_analysis_sasi_idx;
  5. Insert data, and then query the table with the SASI index.

    Query the table using the bio column using a stemming match:

      SELECT name, country, age, bio FROM cycling.cyclist_base 
        WHERE bio LIKE 'winning';
    Results
     name         | country       | age | bio
    --------------+---------------+-----+-------------------------------------------------------------------------
     Adrien COSTA | United States |  18 |  Professional cyclist, who likes racing in Asia and Europe, wins a lot.
       Zam WEATON | United States |  20 | Semi-pro cyclist, works delivering packages at night and winning races.
    
    (2 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

    Query the table using the bio column using another stemming match:

      SELECT name, country, age, bio FROM cycling.cyclist_base 
        WHERE bio LIKE 'like';
    Results
     name         | country       | age | bio
    --------------+---------------+-----+------------------------------------------------------------------------
     Adrien COSTA | United States |  18 | Professional cyclist, who likes racing in Asia and Europe, wins a lot.
    
    (1 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Use DelimiterAnalyzer for tokenized searches that store a column as a delimited list of values.

This can be used as an alternative to indexing collection types, and delimiter separated text can be indexed without the overhead of CONTAINS mode nor using PREFIX or SUFFIX queries.

To use DelimiterAnalyzer, add USING 'org.apache.cassandra.index.sasi.SASIIndex' and `WITH OPTIONS to your CREATE INDEX statement. The options for DelimiterAnalyzer are as follows:

  • analyzer_class: Must be org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer.

  • analyzed: Must be true to use DelimiterAnalyzer.

  • delimiter: Character used to separate values in the column. The default is comma (',').

  • tokenization_locale: Two-letter language code. The default is en.

  • tokenization_enable_stemming: Whether to enable stemming. The default is false (no stemming).

  • tokenization_skip_stop_words: Whether to skip stop words, such as "the" and "an". The default is false (do not skip).

  • tokenization_normalize_lowercase: Whether to normalize all characters as lowercase. The default is false (not normalized).

  • tokenization_normalize_uppercase: Whether to normalize all characters as uppercase. The default is false (not normalized).

The following example creates and queries a SASI index using DelimiterAnalyzer:

  1. Create a table called cycling.cyclist_base with TEXT column called name:

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Add a text column called aliases:

      ALTER TABLE cycling.cyclist_base
        ADD aliases text;
  3. Create a SASI index on the aliases column with the DelimiterAnalyzer and appropriate options:

      CREATE CUSTOM INDEX delimiter_sasi_idx on cycling.cyclist_base (aliases) 
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer',
          'delimiter': ',',
          'mode': 'prefix',
          'analyzed': 'true'
        };
  4. Verify the created indexes by checking the index schema:

      DESCRIBE INDEX cycling.delimiter_sasi_idx;
    Results
    CREATE CUSTOM INDEX delimiter_sasi_idx ON cycling.cyclist_base (aliases) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 'delimiter': ',', 'mode': 'prefix', 'analyzed': 'true'};
  5. Add data, and then query the table with the new index. For this example, use a matching pattern for aliases that checks for one of the aliases listed in the column, such as Stevie.

      SELECT * FROM cycling.cyclist_base WHERE aliases LIKE 'Stevie' ALLOW FILTERING;
    Results
     cid                                  | age | aliases             | bio  | birthday   | country     | name
    --------------------------------------+-----+---------------------+------+------------+-------------+-------------------
     6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK
    
    (1 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Non-text columns (non-primary key)

SASI indexes created on any non-text, non-primary key columns use PREFIX (default) or SPARSE mode.

The SPARSE mode has been designed primarily to index specialized, unique values (timestamps, UUIDs, and other identifiers) while allowing efficient storage and range queries. This mode is used for situations where each indexed value has a maximum of five matching rows. If there are more than five matching rows, an error is returned.

In either mode, the operator = can be used for equality queries, and the operators >, >=, <, and can be used for range queries. These are the only operations for non-text columns using SASI indexing.

Example: Integer column with PREFIX mode
  1. Create a table called cycling.cyclist_base with INT column called age if it doesn’t already exist.

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the age column, a non-primary key INT column using the PREFIX mode.

      CREATE CUSTOM INDEX IF NOT EXISTS age_sasi_idx ON cycling.cyclist_base (age)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = { 
          'mode': 'PREFIX' 
        };
  3. Verify the created indexes by checking the index schema.

      DESCRIBE INDEX cycling.age_sasi_idx;
    Results
    CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
  4. Query the table with the new index.

    Equality query on the age column:

      SELECT * FROM cycling.cyclist_base WHERE age = 28;
    Results
     cid                                  | age | aliases             | bio  | birthday   | country     | name
    --------------------------------------+-----+---------------------+------+------------+-------------+-------------------
     6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK
    
    (1 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

    Range query on the age column:

      SELECT * FROM cycling.cyclist_base WHERE age >= 20;
    Results
     cid                                  | age | aliases             | bio                                                                     | birthday   | country       | name
    --------------------------------------+-----+---------------------+-------------------------------------------------------------------------+------------+---------------+-------------------
     e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 |                null |                                                                    null | 1993-06-18 |   New Zealand |        Alex FRAME
     c9c9c484-5e4a-4542-8203-8d047a01b8a8 |  27 |                null |                                                                    null | 1987-09-04 |        Brazil |   Cristian EGIDIO
     862cc51f-00a1-4d5a-976b-a359cab7300e |  20 |                null |                                                                    null | 1994-09-04 |       Denmark |     Joakim BUKDAL
     220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  38 |                null |                                                                    null | 1977-07-08 |         Italy |   Paolo TIRALONGO
     6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | Steve,Steven,Stevie |                                                                    null | 1987-06-07 |   Netherlands | Steven KRUIKSWIJK
     95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b |  20 |                null | Semi-pro cyclist, works delivering packages at night and winning races. | 1993-04-09 | United States |        Zam WEATON
    
    (6 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

    This query fails because it has two age ranges with an AND operator. This is impossible, even with ALLOW FILTERING.

      SELECT * FROM cycling.cyclist_base WHERE age >= 28 AND age < 20;
    Results
    Fails because of the disjoint ranges
Example: Date column with SPARSE mode
  1. Create a table called cycling.cyclist_base with a date column called birthday:

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the birthday column, a non-primary key date column using the SPARSE mode:

      CREATE CUSTOM INDEX IF NOT EXISTS birthday_sasi_idx 
        ON cycling.cyclist_base (birthday)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = { 
          'mode': 'SPARSE' 
        };
  3. Verify the created indexes by checking the index schema:

      DESCRIBE INDEX cycling.age_sasi_idx;
    Results
    CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
  4. Query the table with the new index.

    Equality query on the birthday column:

      SELECT * FROM cycling.cyclist_base WHERE birthday = '1987-06-07';
    Results
     cid                                  | age | aliases             | bio  | birthday   | country     | name
    --------------------------------------+-----+---------------------+------+------------+-------------+-------------------
     6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | Steve,Steven,Stevie | null | 1987-06-07 | Netherlands | Steven KRUIKSWIJK
    
    (1 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

    Range query on the birthday column:

      SELECT * FROM cycling.cyclist_base WHERE birthday >= '1997-03-29';
    Results
     cid                                  | age | aliases | bio                                                                    | birthday   | country       | name
    --------------------------------------+-----+---------+------------------------------------------------------------------------+------------+---------------+---------------
     15a116fc-b833-4da6-ab9a-4a7775752836 |  18 |    null | Professional cyclist, who likes racing in Asia and Europe, wins a lot. | 1997-08-19 | United States |  Adrien COSTA
     1c526849-d3a2-42a3-bcf9-7903c80b3d16 |  19 |    null |                                                                   null | 1998-12-23 |     Australia | Kanden GROVES
     18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 |  18 |    null |                                                                   null | 1997-03-29 |   Netherlands |   Bram WELTEN
    
    (3 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Compound queries using two or more SASI indexes

SASI supports queries with multiple predicates. However, due to the nature of the default indexing implementation, CQL requires the user to specify ALLOW FILTERING. Normally, adding ALLOW FILTERING opens the user to the potential performance pitfalls of such a query. With SASI, the performance pitfalls don’t exist because the filtering is not performed; however, you must still include ALLOW FILTERING to reduce modifications to the grammar.

Example: Query a SASI index and a non-indexed column

SASI supports filtering on non-indexed columns. The expression can only narrow down an existing query using AND. To use a non-indexed column, the SASI index must be used first, and then the non-indexed column can be used to further filter the results.

  1. Create a table called cycling.cyclist_base with TEXT column called name:

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create a SASI index on the name column, a non-primary key TEXT column to use CONTAINS and case sensitivity:

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS'
        };
  3. Verify the created indexes by checking the index schema:

      DESCRIBE INDEX cycling.name_sasi_idx;
    Results
    CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
  4. Query the table with a matching pattern on the indexed column and a non-indexed value for the non-indexed column. This example uses the matching pattern of '%A' and age > 20:

      SELECT * FROM cycling.cyclist_base WHERE name LIKE 'A%' AND age > 20
        ALLOW FILTERING;
    Results
     cid | age | birthday | country | name
    -----+-----+----------+---------+------
    
    (0 rows)
Example: Query two or more SASI indexes
  1. Create a table called cycling.cyclist_base with TEXT column called name:

    CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
      cid UUID PRIMARY KEY,
      name text,
      age int,
      birthday date,
      country text
    );
  2. Create SASI indexes on the name column (a non-primary key TEXT column to use CONTAINS and case sensitivity) and the birthday column (a non-primary key DATE column using defaults):

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = {
          'mode': 'CONTAINS'
        };
      CREATE CUSTOM INDEX IF NOT EXISTS birthday_sasi_idx 
        ON cycling.cyclist_base (birthday)
        USING 'org.apache.cassandra.index.sasi.SASIIndex'
        WITH OPTIONS = { 
          'mode': 'SPARSE' 
        };
  3. Verify the created indexes by checking the index schema:

    DESCRIBE TABLE cycling.cyclist_base;
    Results
    CREATE TABLE cycling.cyclist_base (
        cid uuid PRIMARY KEY,
        age int,
        birthday date,
        country text,
        name text
    ) WITH additional_write_policy = '99PERCENTILE'
        AND bloom_filter_fp_chance = 0.01
        AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
        AND comment = ''
        AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
        AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
        AND crc_check_chance = 1.0
        AND default_time_to_live = 0
        AND gc_grace_seconds = 864000
        AND max_index_interval = 2048
        AND memtable_flush_period_in_ms = 0
        AND min_index_interval = 128
        AND nodesync = {'enabled': 'true', 'incremental': 'true'}
        AND read_repair = 'BLOCKING'
        AND speculative_retry = '99PERCENTILE';
    
    
    CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
    
    
    CREATE CUSTOM INDEX age_sasi_idx ON cycling.cyclist_base (age) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX'};
  4. Query the table using the SASI indexes. This example uses a matching pattern for the name column ('%A') and a range query for the birthday column.

      SELECT * FROM cycling.cyclist_base WHERE name LIKE '%A' AND birthday > '1987-09-04' 
        ALLOW FILTERING;
    Results
     cid                                  | age | birthday   | country       | name
    --------------------------------------+-----+------------+---------------+--------------
     15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States | Adrien COSTA
    
    (1 rows)
    
    Warnings :
    SASI index was enabled for 'cycling.cyclist_base'. SASI is still in beta, take extra caution when using it in production.

Get a SASI index

Verify that a SASI index exists:

  DESCRIBE INDEX cycling.name_sasi_idx;

The result includes the CREATE INDEX statement used to create the index:

CREATE CUSTOM INDEX name_sasi_idx ON cycling.cyclist_base (name) USING 'org.apache.cassandra.index.sasi.SASIIndex';

Alter a SASI index

SASI indexes cannot be altered. If you need to modify an SASI index, you must drop the index, and then create a new index.

  1. Drop index:

      DROP INDEX IF EXISTS cycling.name_sasi_idx;
  2. Create new index:

      CREATE CUSTOM INDEX IF NOT EXISTS name_sasi_idx 
        ON cycling.cyclist_base (name)
        USING 'org.apache.cassandra.index.sasi.SASIIndex';

Drop a SASI index

SASI indexes can be dropped (deleted):

  DROP INDEX IF EXISTS cycling.name_sasi_idx;

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com