Create vector indexes
Vector search uses Storage-Attached Indexing (SAI) to index and search vector data. This page describes how to create tables with vector columns and the indexes required for vector search.
Vector data type
The CQL vector data type is used to store vector data.
It supports vectors of 32-bit floating point numbers with between 1 to 65,535 dimensions.
This example creates a table with a vector column of 4 dimensions; production use cases typically use vectors with many more dimensions.
CREATE TABLE products (
product_id UUID PRIMARY KEY,
categories SET<TEXT>,
name TEXT,
price DECIMAL,
description VECTOR<FLOAT, 4>
);
Index a vector column
To enable vector search, you must create an SAI index on a vector column.
The syntax for creating an SAI index on a vector column is the same as for other data types.
CREATE CUSTOM INDEX products_idx
ON products (description) USING 'StorageAttachedIndex';
Choose a similarity function
You can choose a similarity function when you create an SAI index on a vector column.
If you do not specify a similarity function, the default is cosine.
Once selected, the similarity function cannot be changed without dropping and recreating the index.
Similarity funtions are also known as similarity metrics.
The supported similarity functions are:
| Metric | Description | ||
|---|---|---|---|
|
Default metric, calculates the cosine of the angle between two vectors. |
||
|
Compares vectors by calculating their dot products. More efficient than
|
||
|
Calculates the Euclidean distance between two vectors. |
Use the WITH OPTIONS clause to specify a similarity function when you create an SAI index on a vector column.
CREATE CUSTOM INDEX products_idx
ON products (description) USING 'StorageAttachedIndex'
WITH OPTIONS = {'similarity_function': 'dot_product'};
Insert vector data
You can insert vector data using the CQL INSERT statement.
'vector' literals are comma-delimited lists of floating-point values enclosed in square brackets ([]).
This example inserts four rows with 'vector' data into the products table.
INSERT INTO products (product_id, categories, name, price, description)
VALUES (uuid(), {'electronics', 'audio'}, 'Wireless Headphones', 79.99,
[0.12, 0.34, 0.56, 0.78]);
INSERT INTO products (product_id, categories, name, price, description)
VALUES (uuid(), {'electronics', 'gaming'}, 'Gaming Mouse', 49.99,
[0.22, 0.18, 0.91, 0.44]);
INSERT INTO products (product_id, categories, name, price, description)
VALUES (uuid(), {'home', 'kitchen'}, 'chopping board', 89.50,
[0.05, 0.67, 0.33, 0.21]);
INSERT INTO products (product_id, categories, name, price, description)
VALUES (uuid(), {'electronics', 'fitness', 'health'}, 'heart monitor', 25.00,
[0.88, 0.12, 0.45, 0.66]);