Query vector data with CQL

You can use the vector data type in Cassandra Query Language (CQL) to enable vector searches of your data. Using CQL, you can create a schema and an index, load vector data into your database, and use CQL to perform a vector search.

Your Serverless (Vector) database is ready to query vector data with CQL.

Prerequisites

Create the vector schema

  1. In the CQLSH, select the keyspace to use for your vector search table.

    This example uses default_keyspace as the keyspace name.

    USE default_keyspace;
  2. Create a new table in your keyspace with a five-dimensional vector column.

    CREATE TABLE IF NOT EXISTS default_keyspace.products
    (
      id int PRIMARY KEY,
      name TEXT,
      description TEXT,
      item_vector VECTOR<FLOAT, 5> // create a five-dimensional embedding
    );
  3. Create the index:

    CREATE INDEX IF NOT EXISTS ann_index
      ON default_keyspace.products(item_vector)
      WITH OPTIONS = {'source_model': 'other'};

    The source_model option configures the index with the fastest settings for a given source of embeddings vectors. source_model options are openai-v3-large, openai-v3-small, ada002, gecko, 'nv-qa-4', 'cohere-v3', bert, and other. The default is other.

    To change index settings, you must drop and rebuild the index. For more information about indexes, see Storage Attached Indexing overview.

    You can also choose a specific similarity function for your index. If you selected a source_model, you don’t need to include a similarity_function.

    CREATE INDEX IF NOT EXISTS ann_index
      ON default_keyspace.products(item_vector)
      WITH OPTIONS = {'similarity_function': 'DOT_PRODUCT'};

    Valid values for the similarity_function are COSINE (default), DOT_PRODUCT, or EUCLIDEAN.

Load the data into the database

Insert sample data into the table using the new item_vector type:

INSERT INTO default_keyspace.products (id, name, description, item_vector) VALUES
(
  1, // id
  'Coded Cleats', // name
  'Chat bot integrated sneakers that talk to you', // description
  [0.1, 0.15, 0.3, 0.12, 0.05] // item_vector
);

INSERT INTO default_keyspace.products (id, name, description, item_vector) VALUES
(
  2,
  'Logic Layers',
  'An AI quilt to help you sleep forever',
  [0.45, 0.09, 0.01, 0.2, 0.11]
);

INSERT INTO default_keyspace.products (id, name, description, item_vector) VALUES
(
  5,
  'Vision Vector Frame',
  'A deep learning display that controls your mood',
  [0.1, 0.05, 0.08, 0.3, 0.6]
);

Query vector data with CQL

To query data using vector search, use a SELECT query:

SELECT * FROM default_keyspace.products
  ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
  LIMIT 1;

Calculate the similarity

You can calculate the similarity of the best scoring row in a table using a vector search query. For applications where similarity and relevance are crucial, this calculation helps you make informed decisions. This calculation enables algorithms to provide more tailored and accurate results.

The supported functions for this type of query are similarity_dot_product, similarity_cosine, and similarity_euclidean. You can use this query with the VECTOR_COLUMN and EMBEDDING_VALUE parameters, which represent vectors.

Use a SELECT query to find the row that is most similar to the vector in the search query.

SELECT description, similarity_cosine(item_vector, [0.1, 0.15, 0.3, 0.12, 0.05])
  FROM default_keyspace.products
  ORDER BY item_vector ANN OF [0.1, 0.15, 0.3, 0.12, 0.05]
  LIMIT 1;

What’s next?

Learn how to filter your vector search by specific terms. For more, see Use analyzers with CQL.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com