CREATE CUSTOM INDEX (SASI)

Generates a SASI index on a single table column (experimental).

Generates SSTable Attached Secondary Index (SASI) on a table column.

Attention: The o.a.c.index.Index interface was modified to comply with core storage engine changes. Updated implementations are required. If unsure, drop all existing custom secondary indexes before upgrading to DataStax Enterprise (DSE) 6.0, except DSE Search indexes, which do not need to be replaced. Because a rewrite of custom index implementations is necessary in DSE 6.0, DataStax can help you find a solution.

SASI uses significantly using fewer memory, disk, and CPU resources. It enables querying with PREFIX and CONTAINS on strings, similar to the SQL implementation of LIKE = "foo*" or LIKE = "*foo*".

Attention: SASI indexes in DSE are experimental. DataStax does not support SASI indexes for production.

For more information about SASI, see Using SASI.

Synopsis

CREATE CUSTOM INDEX [ IF NOT EXISTS ] [ index_name ]
  ON [keyspace_name.]table_name (column_name)
  USING 'org.apache.cassandra.index.sasi.SASIIndex' 
  [ WITH OPTIONS = { option_map } ] ;

Table 1. Legend
Syntax conventions	Description
UPPERCASE	Literal keyword.
Lowercase	Not literal.
`Italics`	Variable value. Replace with a user-defined value.
`[]`	Optional. Square brackets ( `[]` ) surround optional command arguments. Do not type the square brackets.
`( )`	Group. Parentheses ( `( )` ) identify a group to choose from. Do not type the parentheses.
`\|`	Or. A vertical bar ( `\|` ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
`...`	Repeatable. An ellipsis ( `...` ) indicates that you can repeat the syntax element as often as required.
`'Literal string'`	Single quotation ( `'` ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
`{ key : value }`	Map collection. Braces ( `{ }` ) enclose map collections or key value pairs. A colon separates the key and the value.
`<datatype1,datatype2>`	Set, list, map, or tuple. Angle brackets ( `< >` ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
`cql_statement;`	End CQL statement. A semicolon ( `;` ) terminates all CQL statements.
`[--]`	Separate the command line options from the command arguments with two hyphens ( `--` ). This syntax is useful when arguments might be mistaken for command line options.
`' <schema> ... </schema> '`	Search CQL only: Single quotation marks ( `'` ) surround an entire XML schema declaration.
`@xml_entity='xml_entity_type'`	Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.

index_name

Optional identifier for index. If no name is specified, the default is used, table_name_column_name_idx. Enclose in quotes to use special characters or preserve capitalization.

OPTIONS

Define options in JSON simple format.

Specifying an analyzer allows:

Analyzing and indexing text column data
Using word stemming for indexing
Specifying words that can be skipped
Applying localization based on a specified language
Case normalization, like the non-tokening analyzer

Analyzer class option

The SASI indexer has two analyzer classes (analyzer_class):

org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer (default analyzer)
org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer

Specify the class:

'class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer'

There are global options that apply to both and class specify options, Standard Analyzer and Non-tokenizing Analyzer.

Global options

The following options apply to all analyzer classes:


Option	Description
analyzed	True indicates if the literal column is analyzed using the specified analyzer.
is_literal	Designates a column as literal.
max_compaction_flush_memory_in_mb	Enter the size.

Standard analyzer options

Default analyzer class. The following options are available for the org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer.


Option	Description
tokenization_enable_stemming	Reduce words to their base form, for example "stemmer", "stemming", "stemmed" are based on "stem". Default: `false`.
tokenization_skip_stop_words	Comma-separate list of words to ignore, for example 'and, the, or'.
tokenization_locale	Language code of the column, see List of localization codes. Default: `en`.
tokenization_normalize_lowercase	Use lowercase. Default `false`.
tokenization_normalize_uppercase	Use uppercase. Default: `false`.

Non-tokenizing analyzer options

The following options are available for the org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer.


Option	Description
normalize_lowercase	Index all strings as lowercase. Default: `false`.
normalize_uppercase	Index all strings as uppercase. Default: `false`.
case_sensitive	Ignore case in matching. Default is case-sensitive indexing, setting: `true`.

Examples

All examples use the cycling.cyclist_name table.

Creating a SASI PREFIX index on a column

Create an SASI index on the column firstname:

CREATE CUSTOM INDEX fn_prefix 
   ON cycling.comments (commenter) 
   USING 'org.apache.cassandra.index.sasi.SASIIndex';

The SASI mode PREFIX is the default, and does not need to be specified.

Creating a SASI CONTAINS index on a column

Create an SASI index on the column firstname:

CREATE CUSTOM INDEX fn_contains 
   ON cycling.comments (comment) 
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = { 'mode': 'CONTAINS' };

The SASI mode CONTAINS must be specified.

Creating a SASI SPARSE index on a column

Define a table and then create an SASI index on the column age:

CREATE CUSTOM INDEX fn_sparse 
   ON cycling.comments (record_id) 
   USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = { 'mode': 'SPARSE' };

The SASI mode SPARSE must be specified. This mode is used for dense number columns that store timestamps or millisecond sensor readings.

Creating a SASI PREFIX index on a column using the non-tokenizing analyzer

Define a table, then create an SASI index on the column age:

CREATE CUSTOM INDEX  fn_notcasesensitive 
ON cycling.comments (comment) 
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 
   'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
   'case_sensitive': 'false'};

Using the non-tokenizing analyzer is a method to specify case sensitivity or character case normalization without analyzing the specified column.

Creating a SASI analyzing index on a column

Define a table and then create an SASI index on the column comments:

CREATE CUSTOM INDEX stdanalyzer_idx 
   ON cycling.comments (comment) 
   USING 'org.apache.cassandra.index.sasi.SASIIndex' 
   WITH OPTIONS = { 'mode': 'CONTAINS',
                    'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
                    'analyzed': 'true',
                    'tokenization_skip_stop_words': 'and, the, or',
                    'tokenization_enable_stemming': 'true',
                    'tokenization_normalize_lowercase': 'true',
                    'tokenization_locale': 'en' }
;