CREATE CUSTOM INDEX (SASI)
Generates a SASI index on a single table column (experimental).
Generates SSTable Attached Secondary Index (SASI) on a table column.
SASI uses significantly using fewer memory, disk, and CPU resources. It enables querying
with PREFIX and CONTAINS on strings, similar to the
SQL implementation of LIKE = "foo*"
or LIKE = "*foo*"
.
For more information about SASI, see Using SASI.
Synopsis
CREATE CUSTOM INDEX [ IF NOT EXISTS ] [ index_name ] ON [keyspace_name.]table_name (column_name) USING 'org.apache.cassandra.index.sasi.SASIIndex' [ WITH OPTIONS = { option_map } ] ;
Syntax conventions | Description |
---|---|
UPPERCASE | Literal keyword. |
Lowercase | Not literal. |
Italics |
Variable value. Replace with a user-defined value. |
[] |
Optional. Square brackets ( [] ) surround
optional command arguments. Do not type the square brackets. |
( ) |
Group. Parentheses ( ( ) ) identify a group to
choose from. Do not type the parentheses. |
| |
Or. A vertical bar ( | ) separates alternative
elements. Type any one of the elements. Do not type the vertical
bar. |
... |
Repeatable. An ellipsis ( ... ) indicates that
you can repeat the syntax element as often as required. |
'Literal string' |
Single quotation ( ' ) marks must surround
literal strings in CQL statements. Use single quotation marks to
preserve upper case. |
{ key : value
} |
Map collection. Braces ( { } ) enclose map
collections or key value pairs. A colon separates the key and the
value. |
<datatype1,datatype2> |
Set, list, map, or tuple. Angle brackets ( <
> ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma. |
cql_statement; |
End CQL statement. A semicolon ( ; ) terminates
all CQL statements. |
[--] |
Separate the command line options from the command arguments with
two hyphens ( -- ). This syntax is useful when
arguments might be mistaken for command line options. |
' <schema> ... </schema>
' |
Search CQL only: Single quotation marks ( ' )
surround an entire XML schema declaration. |
@xml_entity='xml_entity_type' |
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
- index_name
- Optional identifier for index. If no name is specified, the default is used,
table_name_column_name_idx
. Enclose in quotes to use special characters or preserve capitalization. - OPTIONS
-
Define options in JSON simple format.
Specifying an analyzer allows:- Analyzing and indexing text column data
- Using word stemming for indexing
- Specifying words that can be skipped
- Applying localization based on a specified language
- Case normalization, like the non-tokening analyzer
Analyzer class option
The SASI indexer has two analyzer classes (analyzer_class):- org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer (default analyzer)
- org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer
'class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer'
There are global options that apply to both and class specify options, Standard Analyzer and Non-tokenizing Analyzer.
Global options
The following options apply to all analyzer classes:Option Description analyzed True indicates if the literal column is analyzed using the specified analyzer. is_literal Designates a column as literal. max_compaction_flush_memory_in_mb Enter the size. Standard analyzer options
Default analyzer class. The following options are available for theorg.apache.cassandra.index.sasi.analyzer.StandardAnalyzer
.Option Description tokenization_enable_stemming Reduce words to their base form, for example "stemmer", "stemming", "stemmed" are based on "stem". Default: false
.tokenization_skip_stop_words Comma-separate list of words to ignore, for example 'and, the, or'. tokenization_locale Language code of the column, see List of localization codes. Default: en
.tokenization_normalize_lowercase Use lowercase. Default false
.tokenization_normalize_uppercase Use uppercase. Default: false
.Non-tokenizing analyzer options
The following options are available for theorg.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer
.Option Description normalize_lowercase Index all strings as lowercase. Default: false
.normalize_uppercase Index all strings as uppercase. Default: false
.case_sensitive Ignore case in matching. Default is case-sensitive indexing, setting: true
.
Examples
All examples use the cycling.cyclist_name table.
Creating a SASI PREFIX index on a column
firstname
:CREATE CUSTOM INDEX IF NOT EXISTS fn_prefix ON cycling.comments (commenter) USING 'org.apache.cassandra.index.sasi.SASIIndex';
The SASI mode PREFIX
is the default, and does not need to be
specified.
Creating a SASI CONTAINS index on a column
Create an SASI
index on the column
firstname
:
CREATE CUSTOM INDEX IF NOT EXISTS fn_contains ON cycling.comments (comment) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode' : 'CONTAINS' };The SASI mode
CONTAINS
must be specified.Creating a SASI SPARSE index on a column
age
:CREATE CUSTOM INDEX IF NOT EXISTS fn_sparse ON cycling.comments (record_id) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode' : 'SPARSE' };
SPARSE
must be specified. This mode is used for dense number
columns that store timestamps or millisecond sensor readings.Creating a SASI PREFIX index on a column using the non-tokenizing analyzer
age
:CREATE CUSTOM INDEX IF NOT EXISTS fn_notcasesensitive ON cycling.comments (comment) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive' : 'false' };
Creating a SASI analyzing index on a column
comments
:CREATE CUSTOM INDEX IF NOT EXISTS stdanalyzer_idx ON cycling.comments (comment) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode' : 'CONTAINS', 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed' : 'true', 'tokenization_skip_stop_words' : 'and, the, or', 'tokenization_enable_stemming' : 'true', 'tokenization_normalize_lowercase' : 'true', 'tokenization_locale' : 'en' };