CREATE CUSTOM INDEX (SASI)

Generates a SASI index on a single table column (experimental).

Generates SSTable Attached Secondary Index (SASI) on a table column.

Upgrade impact: Changes in DSE 6.0 could effect your implementation. See 6.0 release notes and be sure to follow the upgrade instructions for required actions.
  • The o.a.c.index.Index interface was modified to comply with core storage engine changes. Updated implementations are required. If unsure, drop all existing custom secondary indexes before upgrading, except DSE Search indexes, which do not need to be replaced. Because a rewrite of custom index implementations is necessary in DSE, DataStax Support can help you find a solution.

SASI uses significantly using fewer memory, disk, and CPU resources. It enables querying with PREFIX and CONTAINS on strings, similar to the SQL implementation of LIKE = "foo*" or LIKE = "*foo*".

Attention: SASI indexes in DSE are experimental. DataStax does not support SASI indexes for production.

For more information about SASI, see Using SASI.

Synopsis

CREATE CUSTOM INDEX [ IF NOT EXISTS ] [ index_name ]
  ON [keyspace_name.]table_name (column_name)
  USING 'org.apache.cassandra.index.sasi.SASIIndex' 
  [ WITH OPTIONS = { option_map } ] ;
Table 1. Legend
Syntax conventions Description
UPPERCASE Literal keyword.
Lowercase Not literal.
Italics Variable value. Replace with a user-defined value.
[] Optional. Square brackets ( [] ) surround optional command arguments. Do not type the square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
{ key : value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.
[--] Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.
index_name
Optional identifier for index. If no name is specified, the default is used, table_name_column_name_idx. Enclose in quotes to use special characters or preserve capitalization.
OPTIONS

Define options in JSON simple format.

Specifying an analyzer allows:
  • Analyzing and indexing text column data
  • Using word stemming for indexing
  • Specifying words that can be skipped
  • Applying localization based on a specified language
  • Case normalization, like the non-tokening analyzer

Analyzer class option

The SASI indexer has two analyzer classes (analyzer_class):
  • org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer (default analyzer)
  • org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer
Specify the class:
'class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer'

There are global options that apply to both and class specify options, Standard Analyzer and Non-tokenizing Analyzer.

Global options

The following options apply to all analyzer classes:
Option Description
analyzed True indicates if the literal column is analyzed using the specified analyzer.
is_literal Designates a column as literal.
max_compaction_flush_memory_in_mb Enter the size.

Standard analyzer options

Default analyzer class. The following options are available for the org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer.
Option Description
tokenization_enable_stemming Reduce words to their base form, for example "stemmer", "stemming", "stemmed" are based on "stem". Default: false.
tokenization_skip_stop_words Comma-separate list of words to ignore, for example 'and, the, or'.
tokenization_locale Language code of the column, see List of localization codes. Default: en.
tokenization_normalize_lowercase Use lowercase. Default false.
tokenization_normalize_uppercase Use uppercase. Default: false.

Non-tokenizing analyzer options

The following options are available for the org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer.
Option Description
normalize_lowercase Index all strings as lowercase. Default: false.
normalize_uppercase Index all strings as uppercase. Default: false.
case_sensitive Ignore case in matching. Default is case-sensitive indexing, setting: true.

Examples

All examples use the cycling.cyclist_name table.

Creating a SASI PREFIX index on a column

Create an SASI index on the column firstname:
CREATE CUSTOM INDEX IF NOT EXISTS fn_prefix 
ON cycling.comments (commenter) 
USING 'org.apache.cassandra.index.sasi.SASIIndex';

The SASI mode PREFIX is the default, and does not need to be specified.

Creating a SASI CONTAINS index on a column

Create an SASI index on the column firstname:

CREATE CUSTOM INDEX IF NOT EXISTS fn_contains 
ON cycling.comments (comment) 
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
  'mode' : 'CONTAINS'
};
The SASI mode CONTAINS must be specified.

Creating a SASI SPARSE index on a column

Define a table and then create an SASI index on the column age:
CREATE CUSTOM INDEX IF NOT EXISTS fn_sparse 
ON cycling.comments (record_id) 
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
  'mode' : 'SPARSE'
};
The SASI mode SPARSE must be specified. This mode is used for dense number columns that store timestamps or millisecond sensor readings.

Creating a SASI PREFIX index on a column using the non-tokenizing analyzer

Define a table, then create an SASI index on the column age:
CREATE CUSTOM INDEX IF NOT EXISTS fn_notcasesensitive 
ON cycling.comments (comment) 
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 
  'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
  'case_sensitive' : 'false'
};
Using the non-tokenizing analyzer is a method to specify case sensitivity or character case normalization without analyzing the specified column.

Creating a SASI analyzing index on a column

Define a table and then create an SASI index on the column comments:
CREATE CUSTOM INDEX IF NOT EXISTS stdanalyzer_idx 
ON cycling.comments (comment) 
USING 'org.apache.cassandra.index.sasi.SASIIndex' 
WITH OPTIONS = {
  'mode' : 'CONTAINS',
  'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
  'analyzed' : 'true',
  'tokenization_skip_stop_words' : 'and, the, or',
  'tokenization_enable_stemming' : 'true',
  'tokenization_normalize_lowercase' : 'true',
  'tokenization_locale' : 'en'
};