CREATE CUSTOM INDEX

Supported databases:

  • DataStax DataStax Astra DB (Vector)

  • DataStax DataStax Astra DB DataStax Astra DB DataStax Astra DB Serverless

  • DataStax DataStax Astra DB DataStax Astra DB DataStax Astra DB Classic

  • HCD 1.0

  • DataStax Enterprise 6.9.0 and later

  • DataStax Enterprise 6.8.3 and later

Creates a Storage-Attached Indexing (SAI) index. You can create multiple secondary indexes on the same database table, with each SAI index based on any column in the table. All column date types except the following are supported for SAI indexes:

  • counter

  • geospatial types: PointType, LineStringType, PolygonType

  • non-frozen user-defined type (UDT)

One exception

You cannot define an SAI index based on the partition key when it’s comprised of only one column. If you attempt to create an SAI index in this case, SAI issues an error message.

However, you can define an SAI index on one of the columns in a table’s composite partition key, i.e., a partition key comprised of multiple columns. If you need to query based on one of those columns, an SAI index is a helpful option. In fact, you can define an SAI index on each column in a composite partition key, if needed.

Defining one or more SAI indexes based on any column in a database table (with the rules noted above) subsequently gives you the ability to run performant queries that use the indexed column to filter results.

Since indexes are created at the keyspace level and not at the table level. If you use the IF NOT EXISTS in a CREATE [CUSTOM] INDEX command, and an index with the same name already exists on a different table in the keyspace, the command will not succeed. However, no error will indicate that an index with the same name already exists on another table. An error will be thrown only if the index is created without the IF NOT EXISTS clause.

See the SAI section.

Synopsis

CREATE CUSTOM INDEX [ IF NOT EXISTS ] [ <index_name> ]
  ON [ <keyspace_name>.]<table_name> (<column_name>)
    | [ (KEYS(<map_name>)) ]
    | [ (VALUES(<map_name>)) ]
    | [ (ENTRIES(<map_name>)) ]
  USING 'StorageAttachedIndex'
  [ WITH OPTIONS = { <option_map> } ] ;
Syntax legend
Legend
Syntax conventions Description

UPPERCASE

Literal keyword.

Lowercase

Not literal.

< >

Variable value. Replace with a user-defined value.

[]

Optional. Square brackets ([]) surround optional command arguments. Do not type the square brackets.

( )

Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

|

Or. A vertical bar (|) separates alternative elements. Type any one of the elements. Do not type the vertical bar.

...

Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.

'<Literal string>'

Single quotation (') marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.

{ <key> : <value> }

Map collection. Braces ({ }) enclose map collections or key value pairs. A colon separates the key and the value.

<datatype2

Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.

<cql_statement>;

End CQL statement. A semicolon (;) terminates all CQL statements.

[--]

Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

' <<schema\> ... </schema\>> '

Search CQL only: Single quotation marks (') surround an entire XML schema declaration.

@<xml_entity>='<xml_entity_type>'

Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.

index_name

Optional identifier for index. If no name is specified, the default is used, <table_name>_<column_name>_idx. Enclose in quotes to use special characters or to preserve capitalization.

Index names are unique per keyspace. The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.

column_name

The name of the table column on which the SAI index is being defined. SAI allows only alphanumeric characters and underscores in names. SAI returns InvalidRequestException if you try to define an index on a column name that contains other characters, and does not create the index.

map_name

Used with collections, identifier of the map_name specified in CREATE TABLE …​ map(<map_name>). The regular column syntax applies for collection types list and set.

option_map

Define options in JSON simple format.

Option Description

case_sensitive

Ignore case in matching string values. Default: true.

normalize

When set to true, perform Unicode normalization on indexed strings. SAI supports Normalization Form C (NFC) Unicode. When set to true, SAI normalizes the different versions of a given Unicode character to a single version, retaining all the marks and symbols in the index. For example, SAI would change the character Å (U+212B) to Å (U+00C5).

When implementations keep strings in a normalized form, equivalent strings have a unique binary representation. See Unicode Standard Annex #15, Unicode Normalization Forms.

Default: false.

ascii

When set to true, SAI converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (the first 127 ASCII characters) to the ASCII equivalent, if one exists. For example, this option changes à to a. Default: false.

similarity_function

Vector search relies on computing the similarity or distance between vectors to identify relevant matches. The similarity function is used to compute the similarity between two vectors. Valid options are: EUCLIDEAN, DOT_PRODUCT, COSINE Default: COSINE

Query operators

SAI supports the following query operators for tables with SAI indexes:

  • Numerics: =, <, >, , >=, AND, IN, OR (for vector databases)

  • Strings: =, AND, IN

  • Collections: = CONTAINS, CONTAINS KEY

SAI does not supports the following query operators for tables with SAI indexes:

  • Strings or Numerics: LIKE, OR (for non-vector databases)

Examples

These examples define SAI indexes for the cycling.cyclist_semi_pro table, which is demonstrated in the SAI quickstart.

CREATE CUSTOM INDEX lastname_idx ON cycling.cyclist_semi_pro (lastname) 
  USING 'StorageAttachedIndex' 
  WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 
CREATE CUSTOM INDEX age_idx ON cycling.cyclist_semi_pro (age) 
  USING 'StorageAttachedIndex'; 
CREATE CUSTOM INDEX country_idx ON cycling.cyclist_semi_pro (country) 
  USING 'StorageAttachedIndex' 
  WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 
CREATE CUSTOM INDEX registration_idx ON cycling.cyclist_semi_pro (registration) 
  USING 'StorageAttachedIndex'; 

For sample queries that find data in cycling.cyclist_semi_pro via these sample SAI indexes, see Submit CQL queries. Also refer Examine SAI column index and query rules.

SAI collection map examples with keys, values, and entries

The following examples demonstrate using collection maps of multiple types (keys, values, entries) in SAI indexes. For related information, see Creating collections and Using map type.

Also refer to the SAI collection examples of type list and set in this topic.

If you are logged into the DataStax Astra CQL Console, you already have a keyspace defined for your cloud database; skip this next step and specify your keyspace name with the USE command.

Next, use the keyspace:

USE demo3;

Create an audit table, with a collection map named text_map:

CREATE TABLE audit ( id int PRIMARY KEY , text_map map<text, text>);

Create multiple SAI indexes on the same map column, each using KEYS, VALUES, and ENTRIES.

CREATE CUSTOM INDEX ON audit (KEYS(text_map)) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON audit (VALUES(text_map)) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON audit (ENTRIES(text_map)) USING 'StorageAttachedIndex';

Insert some data:

INSERT INTO audit (id, text_map) values (1, {'Carlos':'Perotti', 'Marcel':'Silva'});
INSERT INTO audit (id, text_map) values (2, {'Giovani':'Pasi', 'Frances':'Giardello'});
INSERT INTO audit (id, text_map) values (3, {'Mark':'Pastore', 'Irene':'Cantona'});

Query all data:

  • Query

  • Result

SELECT * FROM audit;
 id | text_map
----+---------------------------------------------
  1 | {'Carlos': 'Perotti', 'Marcel': 'Silva'}
  2 | {'Frances': 'Giardello', 'Giovani': 'Pasi'}
  3 | {'Irene': 'Cantona', 'Mark': 'Pastore'}

(3 rows)

Query using the SAI index for specific entries in the map column:

  • Query

  • Result

SELECT * FROM audit WHERE text_map['Irene'] = 'Cantona' AND text_map['Mark'] = 'Pastore';
 id | text_map
----+-----------------------------------------
  3 | {'Irene': 'Cantona', 'Mark': 'Pastore'}

(1 rows)

Query using the SAI index for specific keys in the map column using CONTAINS KEY:

  • Query

  • Result

SELECT * FROM audit WHERE text_map CONTAINS KEY 'Giovani';
 id | text_map
----+---------------------------------------------
  2 | {'Frances': 'Giardello', 'Giovani': 'Pasi'}

(1 rows)

Query using the SAI index for specific values in the map column with CONTAINS:

  • Query

  • Result

SELECT * FROM audit WHERE text_map CONTAINS 'Silva';
 id | text_map
----+------------------------------------------
  1 | {'Carlos': 'Perotti', 'Marcel': 'Silva'}

(1 rows)

Remember that in CQL queries using SAI indexes, the CONTAINS clauses are supported with, and specific to:

  • SAI collection maps with keys, values, and entries

  • SAI collections with list and set types

SAI collection examples with list and set types

These examples demonstrate using collections with the list and set types in SAI indexes. For related information, see:

If you have not already, create the keyspace.

CREATE KEYSPACE IF NOT EXISTS demo3 WITH REPLICATION =
       {'class': 'SimpleStrategy', 'replication_factor': '1'};
USE demo3;

Using the list type

Create a calendar table with a collection of type list.

CREATE TABLE calendar (key int PRIMARY KEY, years list<int>);

Create an SAI index using the collection’s years column.

CREATE CUSTOM INDEX ON calendar(years) USING 'StorageAttachedIndex';

Insert some random int list data for years, just for demo purposes.

Notice the INSERT command’s square brackets syntax for list values.

INSERT INTO calendar (key, years) VALUES (0, [1990,1996]);
INSERT INTO calendar (key, years) VALUES (1, [2000,2010]);
INSERT INTO calendar (key, years) VALUES (2, [2001,1990]);

Query with CONTAINS example:

  • Query

  • Result

SELECT * FROM calendar WHERE years CONTAINS 1990;
 key | years
-----+--------------
   0 | [1990, 1996]
   2 | [2001, 1990]
(2 rows)

This example created the calendar table with years list<int>. Of course, you could have created the table with years list<text>, for example, inserted 'string' values, and queried on the strings.

Using the set type

Now create a calendar2 table with a collection of type set.

CREATE TABLE calendar2 (key int PRIMARY KEY, years set<int>);

Create an SAI index using the collection’s years column — this time for the calendar2 table.

CREATE CUSTOM INDEX ON calendar2(years) USING 'StorageAttachedIndex';

Insert some random int set data for years, again just for demo purposes.

Notice the INSERT command’s curly braces syntax for set values.

INSERT INTO calendar2 (key, years) VALUES (0, {1990,1996});
INSERT INTO calendar2 (key, years) VALUES (1, {2000,2010});
INSERT INTO calendar2 (key, years) VALUES (2, {2001,1990,2020});

Query with CONTAINS example from the list:

  • Query

  • Result

SELECT * FROM calendar2  WHERE years CONTAINS 1990;
 key | years
-----+--------------------
   0 |       {1990, 1996}
   2 | {1990, 2001, 2020}

(2 rows)

Remember that in CQL queries using SAI indexes, the CONTAINS clauses are supported with, and specific to:

  • SAI collection maps with keys, values, and entries

  • SAI collections with list and set types

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com