Run Approximate nearest neighbor (ANN) queries
Vector similarity is measured as proximity, where each vector represents a point in a n-dimensional space and its nearest neighbors are the most similar.
An ANN query or vector search takes a query vector and attempts to find its nearest neighbors based on the similarity function defined in the vector index. ANN queries are optimized to find vectors that are approximately nearest to the query vector. Queries to find the exact nearest neighbors can be significantly slower, especially with large datasets, and are not supported in Cassandra-based databases.
Prerequisites
Befor you can run ANN queries, you must have a table with a vector column and an SAI index on that column.
The examples on this page use a table named products with a vector column named description and an SAI index on that column.
This is the table definition used for the examples on this page:
CREATE TABLE products (
product_id UUID PRIMARY KEY,
categories SET<TEXT>,
name TEXT,
price DECIMAL,
description VECTOR<FLOAT, 4>
);
This is the SAI index definition used for the examples on this page:
CREATE CUSTOM INDEX products_idx
ON products (description) USING 'StorageAttachedIndex';
Simple ANN query
Since ANN queries attempt to find similar vectors, you must provide a query vector to compare against the vectors stored in the table.
You can perform an ANN query using the CQL SELECT statement with the ORDER BY and ANN OF operators in the WHERE clause.
You must also specify a LIMIT to restrict the number of results returned.
More results may significantly increase the query time, so DataStax recommends using a LIMIT of less than 100 results.
This example query finds the 3 rows in the product table whose description vectors are approximately the most similar to the query vector [0.15, 0.1, 0.1, 0.35].
SELECT * FROM products
ORDER BY description ANN OF [0.15, 0.1, 0.1, 0.35]
LIMIT 3;
Filtering
Just like other CQL queries, you can filter ANN queries using a WHERE clause.
WHERE clause conditions can only be applied to the partition key columns, clustering columns, or indexed columns.
Filtering with a WHERE clause is very efficient in Cassandra-based databases.
See SELECT usage notes for supported operators.
|
The |
Use a simple filter
The products table has a price column.
You will need to create an SAI index on the price column to use it in a WHERE clause.
This CQL statement creates an SAI index on the price column.
CREATE CUSTOM INDEX products_price_idx
ON products (price) USING 'StorageAttachedIndex';
Once the price column is indexed, you can use the WHERE clause to filter the ANN query results to only include products with a price between 50.00 and 200.00 dollars.
SELECT * FROM products WHERE price > 50.00 AND price < 200.00
ORDER BY description ANN OF [0.11,0.22,0.33,0.44] LIMIT 5;
Use a complex filter
The products table has a categories column of type set<text>.
You can create an index on the categories column and then create a more complex filter using both the categories and price columns.
This CQL statement creates an SAI index on the categories column.
CREATE CUSTOM INDEX products_categories_idx
ON products (categories) USING 'StorageAttachedIndex';
Once the categories column is indexed, you can use the it in filters.
Complex filters can combine multiple indexed columns using AND and OR operators.
This CQL statement filters the ANN query results to only include products in the electronics category with a price between 50.00 and 100.00 dollars.
SELECT * FROM products
WHERE categories CONTAINS 'electronics' AND price >= 50.00 AND price <= 100.00
ORDER BY description ANN OF [0.71,0.22,0.34,0.02] LIMIT 10;