Query vector data with CQL

You can use the CQL vector data type to enable vector search on a table. You can use CQL to create a schema and an index, load vector data into your database, and then perform a vector search.

Create the vector schema

  1. In the CQL shell, select the keyspace where you want to store vector data. The following example uses default_keyspace.

    USE default_keyspace;
  2. Create a new table in your keyspace that has at least one vector column. The following example creates a table with four columns. The vector column has a dimensionality of 5, which means it should store five-dimensional vector embeddings.

    CREATE TABLE IF NOT EXISTS default_keyspace.products
    (
      id int PRIMARY KEY,
      name TEXT,
      description TEXT,
      item_vector VECTOR<FLOAT, 5>
    );
  3. Create a vector index:

    CREATE INDEX IF NOT EXISTS ann_index
      ON default_keyspace.products(item_vector)
      WITH OPTIONS = {'source_model': 'other'};

    The source_model option configures the index with the fastest settings for a given embeddings model. The available options are openai-v3-large, openai-v3-small, ada002, gecko, 'nv-qa-4', 'cohere-v3', bert, and other. The default is other.

    Alternatively, you can base the index on a similarity metric:

    CREATE INDEX IF NOT EXISTS ann_index
      ON default_keyspace.products(item_vector)
      WITH OPTIONS = {'similarity_function': 'DOT_PRODUCT'};

    Valid values for the similarity_function are COSINE (default), DOT_PRODUCT, or EUCLIDEAN. If you specified a source_model, you don’t need to include a similarity_function.

    To change index settings, you must drop and rebuild the index. For more information about indexes, see Storage Attached Indexing overview.

Load the data into the database

Insert data with embeddings:

INSERT INTO default_keyspace.products (id, name, description, item_vector) VALUES
(
  1, // id
  'Coded Cleats', // name
  'Chat bot integrated sneakers that talk to you', // description
  [0.1, 0.15, 0.3, 0.12, 0.05] // item_vector
);

INSERT INTO default_keyspace.products (id, name, description, item_vector) VALUES
(
  2,
  'Logic Layers',
  'An AI quilt to help you sleep forever',
  [0.45, 0.09, 0.01, 0.2, 0.11]
);

INSERT INTO default_keyspace.products (id, name, description, item_vector) VALUES
(
  5,
  'Vision Vector Frame',
  'A deep learning display that controls your mood',
  [0.1, 0.05, 0.08, 0.3, 0.6]
);

Query vector data with CQL

Use a SELECT statement to perform a vector search on your table:

SELECT * FROM default_keyspace.products
  ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
  LIMIT 1;

Calculate the similarity

You can use a SELECT statement to calculate the similarity score of the matching row returned by the vector search. This calculation enables algorithms to provide more tailored and accurate results.

The supported functions for this type of query are similarity_dot_product, similarity_cosine, and similarity_euclidean. The similarity function you use depends on your embeddings model.

Use a SELECT query to find the row that is most similar to the query vector, and then calculate the similarity score between the matching row’s vector and the query vector:

SELECT description, similarity_cosine(item_vector, [0.1, 0.15, 0.3, 0.12, 0.05])
  FROM default_keyspace.products
  ORDER BY item_vector ANN OF [0.1, 0.15, 0.3, 0.12, 0.05]
  LIMIT 1;

Next steps

To use a standard query filter and vector search together, see Use analyzers with CQL.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com