Create SAI index

To create an SAI index, you must define the index name, table name, and column name for the column to be indexed.

Index names are unique per keyspace. The index name must be a unique identifier for the index for each table within a keyspace. This requirement is true for both vector and non-vector indexes.

Prerequisites

To create a simple SAI index:

CREATE CUSTOM INDEX lastname_sai_idx ON cycling.cyclist_semi_pro (lastname) 
USING 'StorageAttachedIndex' 
WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 

CREATE CUSTOM INDEX age_sai_idx ON cycling.cyclist_semi_pro (age) 
USING 'StorageAttachedIndex';

CREATE CUSTOM INDEX country_sai_idx ON cycling.cyclist_semi_pro (country) 
USING 'StorageAttachedIndex'
WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 

CREATE CUSTOM INDEX registration_sai_idx ON cycling.cyclist_semi_pro (registration) 
USING 'StorageAttachedIndex'; 

For most SAI indexes, the column name is defined in the CREATE CUSTOM INDEX statement that also uses USING 'StorageAttachedIndex'. The SAI index options are defined in the WITH OPTIONS clause. The case_sensitive option is set to false to allow case-insensitive searches. The normalize option is set to true to allow searches to be normalized for Unicode characters. The ascii_only option is set to true to allow searches to be limited to ASCII characters.

The map collection data type is the one exception, as shown in the example below.

Partition key SAI error

SAI indexes cannot be created on the partition key, as a primary index already exists and is used for queries. If you attempt to create an SAI on the partition key column, an error will be returned:

  • CQL

  • Result

CREATE CUSTOM INDEX ON demo2.person_id_name_primarykey (id)
       USING 'StorageAttachedIndex';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot create secondary index on the only partition key column id"

map collection in SAI index

Map collections do have a different format than other SAI indexes:

// Create an index on a map key to find all cyclist/team combos for a year 
CREATE INDEX IF NOT EXISTS team_year_keys_idx
ON cycling.cyclist_teams ( KEYS (teams) );

// Create an index on a map key to find all cyclist/team combos for a year 
CREATE INDEX IF NOT EXISTS team_year_values_idx
ON cycling.cyclist_teams ( VALUES (teams) );

// Create an index on a map key to find all cyclist/team combos for a year 
CREATE INDEX IF NOT EXISTS team_year_entries_idx
ON cycling.cyclist_teams ( ENTRIES (teams) );

Vector index options

This example uses the following table:

CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);

Vector indexes support source_model and similarity_function options for the CREATE INDEX command.

Parameter Description Default

source_model

Configures the index for optimal performance for your vectors. Options are: openai_v3_large, openai_v3_small, ada002, gecko, bert, other.

other

similarity_function

Specified similarity function for your index. If you selected a source_model, do not include a similarity_function. Options are: dot_product, cosine, euclidean.

cosine

This example creates an index on the comment_vector column with the similarity function set to dot_product:

CREATE CUSTOM INDEX sim_comments_idx 
    ON cycling.comments_vs (comment_vector) 
    USING 'StorageAttachedIndex'  
    WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT'};

See CREATE CUSTOM INDEX for more information about creating SAI indexes.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com