Create SAI index

Create indexes on one or more columns after defining a table. Secondary indexes created with SAI can be used to query a table using any column other than the table’s partition key. If a composite partition key is defined, a SAI index can be created on any of the partition key columns.

If you decide to create an index, always create either a secondary index (2i) or a storage-attached index (SAI) on a table column, but not both. CQL will be harder to tune and understanding performance will be more difficult if you create both types of indexes on the same table. If you create either a secondary index (2i) or a storage-attached index (SAI) on a table, do not add a DSE Search index to the same table column. See the difference between these index types in the overview.

To create an SAI index, define the table name and column name for the column to be indexed. Index name is optional. Index names are unique per keyspace. The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.

Since indexes are created at the keyspace level and not at the table level. If you use the IF NOT EXISTS in a CREATE [CUSTOM] INDEX command, and an index with the same name already exists on a different table in the keyspace, the command will not succeed. However, no error will indicate that an index with the same name already exists on another table. An error will be thrown only if the index is created without the IF NOT EXISTS clause.

The full syntax for creating an SAI index in CQL is as follows:

CREATE CUSTOM INDEX index_name ON keyspace_name.table_name (column_name) USING 'StorageAttachedIndex' ;

However, some products have added syntactical sugar to make it easier to create SAI indexes. For example, you can create an SAI index using the following syntax in Hyper-Converged Database and Apache Cassandra 5.0:

CREATE INDEX index_name ON keyspace_name.table_name (column_name) USING 'SAI' ;

Apache Cassandra 5.0 has additional syntactical sugar that allows you to create an SAI index using the following syntax:

CREATE INDEX index_name ON keyspace_name.table_name (column_name);

if the default_secondary_index parameter in the cassandra.yaml file is set to SAI.

Prerequisites

To create a simple SAI index:

CREATE CUSTOM INDEX lastname_idx ON cycling.cyclist_semi_pro (lastname) 
  USING 'StorageAttachedIndex' 
  WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 
CREATE CUSTOM INDEX age_idx ON cycling.cyclist_semi_pro (age) 
  USING 'StorageAttachedIndex'; 
CREATE CUSTOM INDEX country_idx ON cycling.cyclist_semi_pro (country) 
  USING 'StorageAttachedIndex' 
  WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 
CREATE CUSTOM INDEX registration_idx ON cycling.cyclist_semi_pro (registration) 
  USING 'StorageAttachedIndex'; 

To create an SAI index:

  CREATE CUSTOM INDEX id_idx ON cycling.cyclist_alt_stats (id) 
    USING 'StorageAttachedIndex';

Vector index options

This example uses the following table:

CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);

Vector indexes support source_model and similarity_function options for the CREATE INDEX command.

Parameter Description Default

source_model

Configures the index for optimal performance for your vectors. Options are: openai_v3_large, openai_v3_small, ada002, gecko, bert, other.

other

similarity_function

Specified similarity function for your index. If you selected a source_model, do not include a similarity_function. Options are: dot_product, cosine, euclidean.

cosine

This example creates an index on the comment_vector column with the similarity function set to dot_product:

CREATE CUSTOM INDEX comment_sim_function_idx 
  ON cycling.comments_vs (comment_vector) USING 'StorageAttachedIndex'  
    WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT'};

This example creates an index on the comment_vector column with the source model set to bert:

  CREATE CUSTOM INDEX comment_source_model_idx 
    ON cycling.comments_vs (comment_vector) USING 'StorageAttachedIndex'
      WITH OPTIONS = { 'source_model': 'bert'};

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com