CREATE INDEX

Defines a new index for a single column of a table.

CQL supports creating an index on most columns, including the partition and cluster columns of a PRIMARY KEY, collections, and static columns. The one exception is that an index cannot be defined based on a single-column partition key. All column date types except the following are supported for indexes:

  • counter

  • geospatial types: PointType, LineStringType, PolygonType

  • non-frozen user-defined type (UDT)

  • frozen user-defined type (UDT)

For maps, you can create indexes using the key, value, or entry (a key:value pair). You can create multiple secondary indexes on the same database table, with each index based on any column in the table.

You can define an index on any single column in a table’s composite partition key, i.e., a partition key comprised of multiple columns. If you need to query based on one of those columns, an index is a helpful option. In fact, you can define an index on each column in a composite partition key, if needed.

Defining one or more indexes based on any column in a database table (with the rules noted above) subsequently enables performant queries that use the indexed column to filter results.

Syntax

CREATE [CUSTOM] INDEX [ IF NOT EXISTS ] [ <index_name> ]
  ON [<keyspace_name>.]<table_name>
  ([ KEYS | VALUES | ENTRIES | FULL] <column_name>)
    USING <index_type>
  [ WITH OPTIONS = { <option_map> } ] ;
Syntax legend
Legend
Syntax conventions Description

UPPERCASE

Literal keyword.

Lowercase

Not literal.

< >

Variable value. Replace with a user-defined value.

[]

Optional. Square brackets ([]) surround optional command arguments. Do not type the square brackets.

( )

Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

|

Or. A vertical bar (|) separates alternative elements. Type any one of the elements. Do not type the vertical bar.

...

Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.

'<Literal string>'

Single quotation (') marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.

{ <key> : <value> }

Map collection. Braces ({ }) enclose map collections or key value pairs. A colon separates the key and the value.

<datatype2

Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.

<cql_statement>;

End CQL statement. A semicolon (;) terminates all CQL statements.

[--]

Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

' <<schema\> ... </schema\>> '

Search CQL only: Single quotation marks (') surround an entire XML schema declaration.

@<xml_entity>='<xml_entity_type>'

Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.

Parameters

Parameter Description Default

index_name

Optional. Identifier of the index. Enclose in quotes to use special characters or preserve capitalization.

If no name is specified, CQL names the index: <table_name>_<column_name>_idx.1

keyspace_name

Optional. Name of the keyspace that contains the table to index.

If no name is specified, the current keyspace is used.

table_name

Name of the table to index.

column_name

Name of the column to index. If used with a map, the column name is the map name. For maps, you can create indexes using the key, value, entry (a key:value pair), or full content of the collection.

1.

Index names are unique per keyspace. The index name must be a unique identifier for the index for each table within a keyspace. This requirement is true for both vector and non-vector indexes.

Index types

Option Description

SAI

A feature that is not available in all CQL distributions, but is recommended for production use in the distributions where it is available. ifdev:cass50,hcd10[] Define with ` USING sai or `USING `StorageAttachedIndex. If `USING is not specified or SAI is not enabled, the default is 2i. Two cassandra.yaml parameters can be set to create default behavior:

  • legacy_local_table - (default) legacy secondary index (2i), implemented as a hidden local table

  • sai - "storage-attached" index, implemented via optimized SSTable/Memtable-attached indexes

endif:cass50,hcd10[] Define with USING `StorageAttachedIndex`. CREATE CUSTOM INDEX must be used to create an SAI index. If USING is not specified or SAI is not enabled, the default is 2i.

SASI

An experimental feature that is not recommended for production use. Define with USING `org.apache.cassandra.index.sasi.SASIIndex`.

Options

SAI options

The options are specific to the index type and are not required. The <option_map> is a map of options and their values defined in JSON simple format.

Option Description Default

similarity_function

Vector search relies on computing the similarity between vectors to identify relevant matches. Choices are EUCLIDEAN, DOT_PRODUCT, or COSINE.

COSINE

source_model

Configures the index for optimal performance for your vectors. Choices are: openai_v3_large, openai_v3_small, ada002, gecko, bert, other.

other

case_sensitive

Allows case-insensitive searches.

true

normalize

Allow searches to be normalized for Unicode characters. SAI supports Normalization Form C (NFC) Unicode. When set to true, SAI normalizes the different versions of a given Unicode character to a single version, retaining all the marks and symbols in the index. For example, SAI would change the character Å (U+212B) to Å (U+00C5).

false

ascii

Allow searches to be limited to ASCII characters. When set to true, converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (the first 127 ASCII characters) to the ASCII equivalent, if one exists. For example, this option changes à to a.

false

index_analyzer

The class that implements the analyzer. Choices are STANDARD, SIMPLE.

STANDARD

tokenizer

The tokenizer for the index. Choices are standard, whitespace, ngram, keyword, simplepattern, stop, and mapping.

standard

filters

The filters for the index. Choices are: porterstem, lowercase, synonym, and languages, including stem definitions like czechstem.

None.

char_filters

The character filters for the index. Choices are htmlstrip, mapping, and patternreplace.

None.

SASI options

Option Description Default

case_sensitive

Allows case-insensitive searches.

true

normalize

Allow searches to be normalized for Unicode characters. SAI supports Normalization Form C (NFC) Unicode. When set to true, SAI normalizes the different versions of a given Unicode character to a single version, retaining all the marks and symbols in the index. For example, SAI would change the character Å (U+212B) to Å (U+00C5).

false

ascii_only

Allow searches to be limited to ASCII characters. When set to true, converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (the first 127 ASCII characters) to the ASCII equivalent, if one exists. For example, this option changes à to a.

false

mode

The mode of the index. Choices are CONTAINS, PREFIX, SPARSE.

CONTAINS

delimiter

The delimiter for the index.

` ` (whitespace)

analyzer_class

The class that implements the analyzer. Choices are org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer, org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer, org.apache.cassandra.index.sasi.analyzer.SimpleAnalyzer, org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer, org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer, org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer, org.apache.cassandra.index.sasi.analyzer.SimpleAnalyzer, org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer.

org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer

analyzed

Whether the index is analyzed.

true

tokenization_enable_stemming

Whether tokenization stemming is enabled.

true

tokenization_locale

The locale for tokenization.

en

tokenization_normalize_lowercase

Whether tokenization normalization is enabled.

true

tokenization_skip_stop_words

Whether tokenization stop words are skipped.

true

Usage notes

If the column already contains data, it is indexed during the execution of this statement. After an index has been created, it is automatically updated when data in the column changes.

Indexing with the CREATE INDEX command can impact performance. Before creating an index, be aware of when and when not to create an index.

Restriction: Indexing counter columns is not supported.

Supported databases:

  • DataStax DataStax Astra DB (Vector)

  • DataStax DataStax Astra DB DataStax Astra DB DataStax Astra DB Serverless

  • DataStax DataStax Astra DB DataStax Astra DB DataStax Astra DB Classic

  • HCD 1.0

  • DataStax Enterprise 6.9.0 and later

  • DataStax Enterprise 6.8.3 and later

Since indexes are created at the keyspace level and not at the table level. If you use the IF NOT EXISTS in a CREATE [CUSTOM] INDEX command, and an index with the same name already exists on a different table in the keyspace, the command will not succeed. However, no error will indicate that an index with the same name already exists on another table. An error will be thrown only if the index is created without the IF NOT EXISTS clause.

SAI supports the following query operators for tables with SAI indexes:

  • Numerics: =, <, >, , >=, AND

  • Strings: =, AND

  • Collections: =, CONTAINS, CONTAINS KEY

SAI does not supports the following query operators for tables with SAI indexes:

  • Strings or Numerics: LIKE, OR, IN

Supported databases:

  • DataStax DataStax Astra DB DataStax Astra DB DataStax Astra DB Classic

  • HCD 1.0

  • DataStax Enterprise 6.9.0 and later

  • DataStax Enterprise 6.8.3 and later

SASI supports the following query operators for tables with SASI indexes:

  • Numerics: =, <, >, , >=, AND, IN

  • Strings: = AND, IN, LIKE

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com