Run Approximate nearest neighbor (ANN) queries

Vector similarity is measured as proximity, where each vector represents a point in a n-dimensional space and its nearest neighbors are the most similar.

An ANN query or vector search takes a query vector and attempts to find its nearest neighbors based on the similarity function defined in the vector index. ANN queries are optimized to find vectors that are approximately nearest to the query vector. Queries to find the exact nearest neighbors can be significantly slower, especially with large datasets, and are not supported in Cassandra-based databases.

Prerequisites

Befor you can run ANN queries, you must have a table with a vector column and an SAI index on that column.

The examples on this page use a table named products with a vector column named description and an SAI index on that column.

This is the table definition used for the examples on this page:

CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    categories SET<TEXT>,
    name TEXT,
    price DECIMAL,
    description VECTOR<FLOAT, 4>
);

This is the SAI index definition used for the examples on this page:

CREATE CUSTOM INDEX products_idx
  ON products (description) USING 'StorageAttachedIndex';

Simple ANN query

Since ANN queries attempt to find similar vectors, you must provide a query vector to compare against the vectors stored in the table. You can perform an ANN query using the CQL SELECT statement with the ORDER BY and ANN OF operators in the WHERE clause. You must also specify a LIMIT to restrict the number of results returned. More results may significantly increase the query time, so DataStax recommends using a LIMIT of less than 100 results.

This example query finds the 3 rows in the product table whose description vectors are approximately the most similar to the query vector [0.15, 0.1, 0.1, 0.35].

SELECT * FROM products
  ORDER BY description ANN OF [0.15, 0.1, 0.1, 0.35]
  LIMIT 3;

Filtering

Just like other CQL queries, you can filter ANN queries using a WHERE clause. WHERE clause conditions can only be applied to the partition key columns, clustering columns, or indexed columns. Filtering with a WHERE clause is very efficient in Cassandra-based databases.

See SELECT usage notes for supported operators.

The IN operator is not supported in the WHERE clause for indexed columns.

Use a simple filter

The products table has a price column. You will need to create an SAI index on the price column to use it in a WHERE clause.

This CQL statement creates an SAI index on the price column.

CREATE CUSTOM INDEX products_price_idx
  ON products (price) USING 'StorageAttachedIndex';

Once the price column is indexed, you can use the WHERE clause to filter the ANN query results to only include products with a price between 50.00 and 200.00 dollars.

SELECT * FROM products WHERE price > 50.00 AND price < 200.00
  ORDER BY description ANN OF [0.11,0.22,0.33,0.44] LIMIT 5;

Use a complex filter

The products table has a categories column of type set<text>. You can create an index on the categories column and then create a more complex filter using both the categories and price columns.

This CQL statement creates an SAI index on the categories column.

CREATE CUSTOM INDEX products_categories_idx
  ON products (categories) USING 'StorageAttachedIndex';

Once the categories column is indexed, you can use the it in filters. Complex filters can combine multiple indexed columns using AND and OR operators.

This CQL statement filters the ANN query results to only include products in the electronics category with a price between 50.00 and 100.00 dollars.

SELECT * FROM products
  WHERE categories CONTAINS 'electronics' AND price >= 50.00 AND price <= 100.00
  ORDER BY description ANN OF [0.71,0.22,0.34,0.02] LIMIT 10;

Run Approximate nearest neighbor (ANN) queries

Prerequisites

Simple ANN query

Filtering

Use a simple filter

Use a complex filter

See also

Was this helpful?

Give Feedback