CREATE INDEX

Defines a new index for a single column of a table.

CQL supports creating an index on most columns, including the partition and cluster columns of a PRIMARY KEY, collections, and static columns. The one exception is that an index cannot be defined based on a single-column partition key.

All column date types except the following are supported for indexes:

counter
Geospatial types: PointType, LineStringType, PolygonType
Non-frozen user-defined type (UDT)

For maps, you can create indexes using the key, value, or entry (a key:value pair). You can create multiple secondary indexes on the same database table, with each index based on any column in the table.

You can define an index on any single column in a table’s composite partition key (a partition key comprised of multiple columns). If you need to query based on one of those columns, an index is a helpful option. You can define an index on each column in a composite partition key, if needed.

Defining one or more indexes based on any column in a database table (with the rules noted above) subsequently enables performant queries that use the indexed column to filter results.

Syntax

CREATE [CUSTOM] INDEX [ IF NOT EXISTS ] [ <index_name> ]
  ON [<keyspace_name>.]<table_name>
  ([ KEYS | VALUES | ENTRIES | FULL] <column_name>)
    USING <index_type>
  [ WITH OPTIONS = { <option_map> } ] ;

Legend
Syntax conventions	Description
UPPERCASE	Literal keyword.
Lowercase	Not literal.
`< >`	Variable value. Replace with a user-defined value.
`[]`	Optional. Square brackets (`[]`) surround optional command arguments. Do not type the square brackets.
`( )`	Group. Parentheses ( `( )` ) identify a group to choose from. Do not type the parentheses.
`\|`	Or. A vertical bar (`\|`) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
`...`	Repeatable. An ellipsis ( `...` ) indicates that you can repeat the syntax element as often as required.
`'<Literal string>'`	Single quotation (`'`) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
`{ <key> : <value> }`	Map collection. Braces (`{ }`) enclose map collections or key value pairs. A colon separates the key and the value.
`<<datatype1>,<datatype2>>`	Set, list, map, or tuple. Angle bracke ts ( `< >` ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
`<cql_statement>;`	End CQL statement. A semicolon (`;`) terminates all CQL statements.
`[--]`	Separate the command line options from the command arguments with two hyphens ( `--` ). This syntax is useful when arguments might be mistaken for command line options.
`' <<schema\> ... </schema\>> '`	Search CQL only: Single quotation marks (`'`) surround an entire XML schema declaration.
`@<xml_entity>='<xml_entity_type>'`	Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.

Parameters

index_name: Optional identifier for index. If no name is specified, the default is <table_name>_<column_name>_idx. Enclose in quotes to use special characters or to preserve capitalization.

Index names are unique per keyspace because indexes are created at the keyspace level and not at the table level.

The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.

If you use IF NOT EXISTS in a CREATE [CUSTOM] INDEX command, the command fails silently if an index with the same name already exists in the keyspace. If you want the command to return an error when an index with the same name already exists, don’t use IF NOT EXISTS.

keyspace_name: Optional name of the keyspace that contains the table to index. If no name is specified, the current keyspace is used.
table_name: The name of the table on which the index is being defined.
column_name: The name of the column to index.

If used with a map, the column name is the map name. For maps, you can create indexes using the key, value, entry (a key:value pair), or full content of the collection.

SAI allows only alphanumeric characters and underscores in names. SAI returns InvalidRequestException if you try to define an index on a column name that contains other characters, and does not create the index.
map_name: Used with collections, identifier of the map_name specified in CREATE TABLE … map(<map_name>). The regular column syntax applies for collection types list and set.
CREATE INDEX | CREATE CUSTOM INDEX, USING <index_type>: See Index type options.
WITH OPTIONS = { <option_map> }: See SAI options .

Index type options

Option Description

Option	Description
`SAI`	A feature that is not available in all CQL distributions, but is recommended for production use in the distributions where it is available. Not required for Astra DB.

SAI

A feature that is not available in all CQL distributions, but is recommended for production use in the distributions where it is available. Not required for Astra DB.

SAI options

The options are specific to the index type and are not required. The <option_map> is a map of options and their values defined in JSON simple format.

Option Description Default

Option	Description	Default
`similarity_function`	Vector search relies on computing the similarity between vectors to identify relevant matches. The similarity function is used to compute the similarity between two vectors. Choices are `EUCLIDEAN`, `DOT_PRODUCT`, or `COSINE`.	`COSINE`
`source_model`	Configures the index for optimal performance for your vectors. Choices are: `openai_v3_large`, `openai_v3_small`, `ada002`, `gecko`, `bert`, `other`.	`other`
`case_sensitive`	Allows case-insensitive searches.	`true`
`normalize`	Allow searches to be normalized for Unicode characters. SAI supports Normalization Form C (NFC) Unicode. When set to `true`, SAI normalizes the different versions of a given Unicode character to a single version, retaining all the marks and symbols in the index. For example, SAI would change the character Å (U+212B) to Å (U+00C5). When implementations keep strings in a normalized form, equivalent strings have a unique binary representation. See Unicode Standard Annex #15, Unicode Normalization Forms.	`false`
`ascii`	Allow searches to be limited to ASCII characters. When set to `true`, SAI converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (the first 127 ASCII characters) to the ASCII equivalent, if one exists. For example, this option changes à to a.	`false`
`index_analyzer`	The class that implements the analyzer. Choices are `STANDARD`, `SIMPLE`.	`STANDARD`
`tokenizer`	The tokenizer for the index. Choices are standard, whitespace, ngram, keyword, simplepattern, stop, and mapping.	`standard`
`filters`	The filters for the index. Choices are: porterstem, lowercase, synonym, and languages, including stem definitions like czechstem.	None.
`char_filters`	The character filters for the index. Choices are htmlstrip, mapping, and patternreplace.	None.

similarity_function

Vector search relies on computing the similarity between vectors to identify relevant matches. The similarity function is used to compute the similarity between two vectors. Choices are EUCLIDEAN, DOT_PRODUCT, or COSINE.

COSINE

source_model

Configures the index for optimal performance for your vectors. Choices are: openai_v3_large, openai_v3_small, ada002, gecko, bert, other.

other

case_sensitive

Allows case-insensitive searches.

true

normalize

Allow searches to be normalized for Unicode characters. SAI supports Normalization Form C (NFC) Unicode. When set to true, SAI normalizes the different versions of a given Unicode character to a single version, retaining all the marks and symbols in the index. For example, SAI would change the character Å (U+212B) to Å (U+00C5).

When implementations keep strings in a normalized form, equivalent strings have a unique binary representation. See Unicode Standard Annex #15, Unicode Normalization Forms.

false

ascii

Allow searches to be limited to ASCII characters. When set to true, SAI converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (the first 127 ASCII characters) to the ASCII equivalent, if one exists. For example, this option changes à to a.

false

index_analyzer

The class that implements the analyzer. Choices are STANDARD, SIMPLE.

STANDARD

tokenizer

The tokenizer for the index. Choices are standard, whitespace, ngram, keyword, simplepattern, stop, and mapping.

standard

filters

The filters for the index. Choices are: porterstem, lowercase, synonym, and languages, including stem definitions like czechstem.

None.

char_filters

The character filters for the index. Choices are htmlstrip, mapping, and patternreplace.

None.

SAI advanced options

These options are intended for advanced users who need to fine-tune the performance of vector indexes. Most use cases won’t require changes to these settings. Modifying these settings can significantly impact the performance and effectiveness of vector searches. Experiment with these settings in a test environment before modifying them in production.

Vector index creation results in a graph that connects vectors to their nearest neighbors. Each vector is represented as a node, with edges linking it to nearby vectors in the embedding space. During a search, the index traverses this graph to efficiently locate approximate nearest neighbors.

You can use these advanced options to tune the index for recall or latency.

Option Description Default

Option	Description	Default
`maximum_node_connections`	Controls the maximum number of connections per node in the graph. The actual graph degree is twice this value. Higher values increase graph quality but also increase storage and query costs. Must be an integer from `1` to `512`.	`16`
`construction_beam_width`	Controls how many candidates to evaluate during graph construction. Higher values increase graph quality but also increase build time. Must be an integer from `1` to `3200`.	`100`
`enable_hierarchy`	Whether to enable hierarchical graph construction.	`false`
`neighborhood_overflow`	Controls graph pruning during construction. Higher values result in denser graphs. Explicitly setting this value overrides both memtable and SSTable defaults. Must be greater than `0`.	`1.0` in memtable `1.2` in SSTables
`alpha`	Controls how aggressively to explore the graph during search. Higher values increase recall at the cost of latency. Must be greater than `0`.	`1.2`

maximum_node_connections

Controls the maximum number of connections per node in the graph. The actual graph degree is twice this value. Higher values increase graph quality but also increase storage and query costs. Must be an integer from 1 to 512.

16

construction_beam_width

Controls how many candidates to evaluate during graph construction. Higher values increase graph quality but also increase build time. Must be an integer from 1 to 3200.

100

enable_hierarchy

Whether to enable hierarchical graph construction.

false

neighborhood_overflow

Controls graph pruning during construction. Higher values result in denser graphs. Explicitly setting this value overrides both memtable and SSTable defaults. Must be greater than 0.

1.0 in memtable
1.2 in SSTables

alpha

Controls how aggressively to explore the graph during search. Higher values increase recall at the cost of latency. Must be greater than 0.

1.2

Usage notes

If the column already contains data, it is indexed during the execution of this statement. After an index has been created, it is automatically updated when data in the column changes.

Indexing with the CREATE INDEX command can impact performance. Before creating an index, be aware of when and when not to create an index.

SAI notes

You can create multiple secondary indexes on the same database table, with each SAI index based on any column in the table. All column date types are supported for SAI indexes except the following:

counter
Geospatial types: PointType, LineStringType, PolygonType
Non-frozen user-defined type (UDT)

You cannot define an SAI index based on the partition key when it’s comprised of only one column. If you attempt to create an SAI index in this case, SAI issues an error message.

Defining one or more SAI indexes based on any column in a database table (with the rules noted above) subsequently gives you the ability to run performant queries that use the indexed column to filter results.

Supported databases for SAI:

Supported databases:

Astra DB
HCD
DSE 6.8.3 and later
Cassandra 5.0 and later

Supported query operators for tables with SAI indexes:

The OR operator is only supported in Astra DB Serverless (vector) databases.

Numerics: =, <, >, <=, >=, AND , OR
Strings: =, AND , OR
Collections: =, CONTAINS, CONTAINS KEY

In CQL queries using SAI indexes, the CONTAINS clauses are supported with, and specific to, the following collection types:

SAI collection maps with keys, values, and entries
SAI collections with list and set types

For more information about SAI, see the SAI section.

Examples

Detailed examples can be found for each type of indexing:

Index type	Example links
SAI	SAI quickstart primary key clustering column non-primary key types collection types vector search

CREATE INDEX

Syntax

Parameters

Index type options

SAI options

SAI advanced options

Usage notes

SAI notes

Examples

See also

Was this helpful?

Give Feedback