CREATE CUSTOM INDEX (SASI)
Generates a SASI index on a single table column.
In Cassandra 3.4 and later, generate SSTable Attached Secondary Index (SASI) on a table
column. SASI indexing and querying
for
queries that previously required the use of ALLOW FILTERING. SASI uses
significantly using fewer memory, disk, and CPU resources. It enables querying with
PREFIX and CONTAINS on strings, similar to the SQL
implementation of LIKE = "foo*" or LIKE = "*foo*".
For more information about SASI, see Using SASI.
Synopsis
CREATE CUSTOM INDEX [IF NOT EXISTS] [index_name]
ON [keyspace_name.]table_name ( column_name )
USING 'org.apache.cassandra.index.sasi.SASIIndex'
[WITH OPTIONS = { option_map }]
| Syntax conventions | Description |
|---|---|
| UPPERCASE | Literal keyword. |
| Lowercase | Not literal. |
Italics |
Variable value. Replace with a user-defined value. |
[] |
Optional. Square brackets ( [] ) surround optional command
arguments. Do not type the square brackets. |
( ) |
Group. Parentheses ( ( ) ) identify a group to choose from. Do
not type the parentheses. |
| |
Or. A vertical bar ( | ) separates alternative elements. Type
any one of the elements. Do not type the vertical bar. |
... |
Repeatable. An ellipsis ( ... ) indicates that you can repeat
the syntax element as often as required. |
'Literal string' |
Single quotation ( ' ) marks must surround literal strings in
CQL statements. Use single quotation marks to preserve upper case. |
{ key : value } |
Map collection. Braces ( { } ) enclose map collections or key
value pairs. A colon separates the key and the value. |
<datatype1,datatype2> |
Set, list, map, or tuple. Angle brackets ( < > ) enclose
data types in a set, list, map, or tuple. Separate the data types with a comma.
|
cql_statement; |
End CQL statement. A semicolon ( ; ) terminates all CQL
statements. |
[--] |
Separate the command line options from the command arguments with two hyphens (
-- ). This syntax is useful when arguments might be mistaken for
command line options. |
' <schema> ... </schema> ' |
Search CQL only: Single quotation marks ( ' ) surround an
entire XML schema declaration. |
@xml_entity='xml_entity_type' |
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
- index_name
- Optional identifier for index. If no name is specified, Cassandra names the index:
table_name_column_name_idx. Enclose in quotes to use special characters or preserve capitalization. - OPTIONS
- Define options in JSON simple format. Specifying an analyzer allows:
- Analyzing and indexing text column data
- Using word stemming for indexing
- Specifying words that can be skipped
- Applying localization based on a specified language
- Case normalization, like the non-tokening analyzer
Analyzer class option
The Cassandra SASI indexer has two analyzer classes (analyzer_class):- (Default) org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer
- org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer
'class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer'There are global options that apply to both and class specify options, Standard Analyzer and Non-tokenizing Analyzer.
Global options
The following options apply to all analyzer classes:
Option Description analyzed True indicates if the literal column is analyzed using the specified analyzer. is_literal Designates a column as literal. max_compaction_flush_memory_in_mb Enter the size. Standard analyzer options
Default analyzer class. The following options are available for theorg.apache.cassandra.index.sasi.analyzer.StandardAnalyzer.Option Description tokenization_enable_stemming Reduce words to their base form, for example "stemmer", "stemming", "stemmed" are based on "stem". Default: false.tokenization_skip_stop_words Comma separate list of words to ignore, for example 'and, the, or'. tokenization_locale Language code of the column, see List of localization codes. Default: en.tokenization_normalize_lowercase Use lowercase. Default false.tokenization_normalize_uppercase Use uppercase. Default: false.Non-tokenizing analyzer options
The following options are available for the
org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer.Option Description normalize_lowercase Index all strings as lowercase. Default: false.normalize_uppercase Index all strings as uppercase. Default: false.case_sensitive Ignore case in matching. Default is case-sensitive indexing, setting: true.
Examples
Creating a SASI PREFIX index on a column
Define a table and then create an SASI index on the column firstname:
CREATE TABLE cycling.cyclist_name (
id UUID PRIMARY KEY,
lastname text,
firstname text
);
CREATE CUSTOM INDEX fn_prefix ON cyclist_name (firstname) USING 'org.apache.cassandra.index.sasi.SASIIndex';
The SASI mode PREFIX is the default, and does not need to be
specified.
Creating a SASI CONTAINS index on a column
Define a table and then create an SASI index on the
column
firstname:
CREATE CUSTOM INDEX fn_contains ON cyclist_name (firstname)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'CONTAINS' };The
SASI mode CONTAINS must be specified.Creating a SASI SPARSE index on a column
Define a table and then create an SASI index on the column
age:
CREATE CUSTOM INDEX fn_contains ON cyclist_name (age)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'SPARSE' };The
SASI mode SPARSE must be specified. This mode is used for dense number
columns that store timestamps or millisecond sensor readings.Creating a SASI PREFIX index on a column using the non-tokenizing analyzer
Define a table, then create an SASI index on the column
age:
CREATE CUSTOM INDEX fn_contains ON cyclist_name (age)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};
Using the non-tokenizing analyzer is a method to specify case sensitivity or character case
normalization without analyzing the specified column.Creating a SASI analyzing index on a column
Define a table and then create an SASI index on the column age:
CREATE CUSTOM INDEX stdanalyzer_idx ON cyclist_name (comments) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'analyzed': 'true',
'tokenization_skip_stop_words': 'and, the, or',
'tokenization_enable_stemming': 'true',
'tokenization_normalize_lowercase': 'true',
'tokenization_locale': 'en'
};
