CREATE CUSTOM INDEX (SASI)
Generates a SASI index on a single table column (experimental).
Generates SSTable Attached Secondary Index (SASI) on a table column.
o.a.c.index.Index
interface was modified to comply with
core storage engine changes. Updated implementations are required. If unsure, drop all
existing custom secondary indexes before upgrading to DataStax Enterprise (DSE) 6.0, except
DSE Search indexes, which do not need to be replaced. Because a rewrite of custom index
implementations is necessary in DSE 6.0, DataStax can help you find a solution.SASI uses significantly using fewer memory, disk, and CPU resources. It enables querying
with PREFIX and CONTAINS on strings, similar to the
SQL implementation of LIKE = "foo*"
or LIKE = "*foo*"
.
For more information about SASI, see Using SASI.
Synopsis
CREATE CUSTOM INDEX [ IF NOT EXISTS ] [ index_name ] ON [keyspace_name.]table_name (column_name) USING 'org.apache.cassandra.index.sasi.SASIIndex' [ WITH OPTIONS = { option_map } ] ;
Syntax conventions | Description |
---|---|
UPPERCASE | Literal keyword. |
Lowercase | Not literal. |
Italics |
Variable value. Replace with a user-defined value. |
[] |
Optional. Square brackets ( [] ) surround
optional command arguments. Do not type the square brackets. |
( ) |
Group. Parentheses ( ( ) ) identify a group to
choose from. Do not type the parentheses. |
| |
Or. A vertical bar ( | ) separates alternative
elements. Type any one of the elements. Do not type the vertical
bar. |
... |
Repeatable. An ellipsis ( ... ) indicates that
you can repeat the syntax element as often as required. |
'Literal string' |
Single quotation ( ' ) marks must surround
literal strings in CQL statements. Use single quotation marks to
preserve upper case. |
{ key : value
} |
Map collection. Braces ( { } ) enclose map
collections or key value pairs. A colon separates the key and the
value. |
<datatype1,datatype2> |
Set, list, map, or tuple. Angle brackets ( <
> ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma. |
cql_statement; |
End CQL statement. A semicolon ( ; ) terminates
all CQL statements. |
[--] |
Separate the command line options from the command arguments with
two hyphens ( -- ). This syntax is useful when
arguments might be mistaken for command line options. |
' <schema> ... </schema>
' |
Search CQL only: Single quotation marks ( ' )
surround an entire XML schema declaration. |
@xml_entity='xml_entity_type' |
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
- index_name
- Optional identifier for index. If no name is specified, the default is used,
table_name_column_name_idx
. Enclose in quotes to use special characters or preserve capitalization. - OPTIONS
-
Define options in JSON simple format.
Specifying an analyzer allows:- Analyzing and indexing text column data
- Using word stemming for indexing
- Specifying words that can be skipped
- Applying localization based on a specified language
- Case normalization, like the non-tokening analyzer
Analyzer class option
The SASI indexer has two analyzer classes (analyzer_class):- org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer (default analyzer)
- org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer
'class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer'
There are global options that apply to both and class specify options, Standard Analyzer and Non-tokenizing Analyzer.
Global options
The following options apply to all analyzer classes:Option Description analyzed True indicates if the literal column is analyzed using the specified analyzer. is_literal Designates a column as literal. max_compaction_flush_memory_in_mb Enter the size. Standard analyzer options
Default analyzer class. The following options are available for theorg.apache.cassandra.index.sasi.analyzer.StandardAnalyzer
.Option Description tokenization_enable_stemming Reduce words to their base form, for example "stemmer", "stemming", "stemmed" are based on "stem". Default: false
.tokenization_skip_stop_words Comma-separate list of words to ignore, for example 'and, the, or'. tokenization_locale Language code of the column, see List of localization codes. Default: en
.tokenization_normalize_lowercase Use lowercase. Default false
.tokenization_normalize_uppercase Use uppercase. Default: false
.Non-tokenizing analyzer options
The following options are available for theorg.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer
.Option Description normalize_lowercase Index all strings as lowercase. Default: false
.normalize_uppercase Index all strings as uppercase. Default: false
.case_sensitive Ignore case in matching. Default is case-sensitive indexing, setting: true
.
Examples
All examples use the cycling.cyclist_name table.
Creating a SASI PREFIX index on a column
firstname
:CREATE CUSTOM INDEX fn_prefix ON cycling.comments (commenter) USING 'org.apache.cassandra.index.sasi.SASIIndex';
The SASI mode PREFIX
is the default, and does not need to be
specified.
Creating a SASI CONTAINS index on a column
Create an SASI
index on the column
firstname
:
CREATE CUSTOM INDEX fn_contains ON cycling.comments (comment) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS' };The SASI mode
CONTAINS
must be specified.Creating a SASI SPARSE index on a column
age
:CREATE CUSTOM INDEX fn_sparse ON cycling.comments (record_id) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'SPARSE' };
SPARSE
must be specified. This mode is used for dense number
columns that store timestamps or millisecond sensor readings.Creating a SASI PREFIX index on a column using the non-tokenizing analyzer
age
:CREATE CUSTOM INDEX fn_notcasesensitive ON cycling.comments (comment) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
Creating a SASI analyzing index on a column
comments
:CREATE CUSTOM INDEX stdanalyzer_idx ON cycling.comments (comment) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'analyzed': 'true', 'tokenization_skip_stop_words': 'and, the, or', 'tokenization_enable_stemming': 'true', 'tokenization_normalize_lowercase': 'true', 'tokenization_locale': 'en' } ;