Perform a vector search

After you load data into a collection, you can use the Data Explorer in the Astra Portal to view your data, search for similar vectors, and filter by metadata.

A vector search determines the similarity between a query vector and the vectors of the documents in the collection. Each document’s resulting similarity score represents the closeness of the query vector and the document’s vector.

To perform a vector search, you need a role that can view the database and collection that you want to search. To perform a vector search with the Data API, you need an application token with this role. You can use a built-in role or a custom role with the following permissions: View DB, Describe All Keyspaces, Describe Keyspace, Select Table, and Describe Table.

Search your data

You can use the Astra Portal or the Data API to perform a vector search.

Search your data with vectorize

For collections that auto-generate embeddings with vectorize, you can perform a similarity search using text, rather than a vector. Vectorize generates an embedding for your text query, and then performs a similarity search based on that embedding.

You can use the Astra Portal or the Data API to perform a search with vectorize.

  • Astra Portal

  • Python

  • TypeScript

  • Java

  • curl

  1. In the Astra Portal, go to Databases, and then select your Serverless (Vector) database.

  2. Click Data Explorer.

  3. Select the Namespace and Collection that contain the data you want to view.

    In the Collection Data section, the ($vectorize) label indicates the field that you designated to auto-generate embeddings for this collection’s documents. The $vector field contains the generated embeddings.

  4. In the Vector Search field, enter a text query, and then click Apply.

    Using the collection’s embedding provider integration, Astra DB generates a vector for your text query, and then performs a similarity search.

    The Collection Data section sorts the data based on the calculated similarity score for each document, from most similar to least similar. Similarity scores are based on the similarity metric that you chose when you created the collection.

  5. Optional: Use metadata filters to refine the search results based on other fields in the collection:

    1. Click Add Filter, and then configure the filter:

      • Key: Select the field to filter on.

      • Condition: Select the filter operator to use. The is condition performs an exact match of a scalar or value within an array, and the contains condition performs an exact match of a value within an array. Some data types have a default condition.

        The Data API supports more operators than the Astra Portal. If you need more filtering options, consider using the Data API clients.

      • Value: Enter a filter value.

        All conditions are case sensitive and the filter value must be an exact match.

        Filter example: is

        For this example, assume that you have the following filter:

        • Key: character

        • Condition: is

        • Value: Lassie

        This filter returns all documents with a character field set to a scalar value of "Lassie" or set to an array containing a value of "Lassie".

        This matches values like "Lassie" and ["Lassie", "Timmy"], but this does not match values like "lassie", "Lassie Come Home", or ["lassie", "Timmy"].

        Filter example: contains

        For this example, assume that you have the following filter:

        • Key: color

        • Condition: contains

        • Value: red

        This filter returns all documents with a color field set to an array containing a value of "red".

        This matches values like ["red", "blue", "green"] and ["red"], but this does not match values like "red", ["reddish", "Red", "Green", "Blue"], or ["green", "blue"].

    2. To add more filters, click Add Filter again.

    3. Click Apply to refresh the Collection Data section based on your filters.

For more information about this command and related commands, see the Documents reference. For a complete list of filter conditions, see Data API operators.

Perform a vector search with vectorize:

# Perform a similarity search
query = "I'd like some talking shoes"
results = collection.find(
    sort={"$vectorize": query},
    limit=2,
    projection={"$vectorize": True},
    include_similarity=True,
)
print(f"Vector search results for '{query}':")
for document in results:
    print("    ", document)

Perform a vector search with vectorize and metadata filters:

# Perform a similarity search with metadata filters
query = "I'd like some talking shoes"
results = collection.find(
    {"$and": [
        {"price": {"$gte": 100}},
        {"name": "John"}
    ]},
    sort={"$vectorize": query},
    limit=10,
    projection={"$vectorize": True},
)
print("Vector search results:")
for document in results:
    print("    ", document)

For vector ANN search, the response is a single page of up to 1000 documents, unless you set a lower limit.

You can use a projection to include specific document properties in the response. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

For more information about this command and related commands, see the Documents reference. For a complete list of filter conditions, see Data API operators.

Perform a vector search with vectorize:

  // Perform a similarity search
  const cursor = await collection.find({}, {
    sort: { $vectorize: 'shoes' },
    limit: 2,
    includeSimilarity: true,
  });

  console.log('* Search results:')
  for await (const doc of cursor) {
    console.log('  ', doc.text, doc.$similarity);
  }

Perform a vector search with vectorize and metadata filters:

  // Perform a similarity search with metadata filters
  const cursor = await collection.find({
    $and: [
      { price: { $gte: 100 } },
      { name: 'John' }
    ]
  }, {
    sort: { $vectorize: 'shoes' },
    limit: 10,
    includeSimilarity: true,
  });

  console.log('* Search results:')
  for await (const doc of cursor) {
    console.log('  ', doc);
  }

For vector ANN search, the response is a single page of up to 1000 documents, unless you set a lower limit.

You can use a projection to include specific document properties in the response. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

For more information about this command and related commands, see the Documents reference. For a complete list of filter conditions, see Data API operators.

Perform a vector search with vectorize:

// Perform a similarity search
FindOptions findOptions = new FindOptions()
       .limit(2)
       .includeSimilarity()
       .sort("I'd like some talking shoes");
FindIterable<Document> results = collection.find(findOptions);
for (Document document : results) {
   System.out.println("Document: " + document);
}

You can use metadata filters with a vectorize vector search in the same way that you would with a regular vector search.

For vector ANN search, the response is a single page of up to 1000 documents, unless you set a lower limit.

You can use a projection to include specific document properties in the response. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

For more information about this command and related commands, see the Documents reference. For a complete list of filter conditions, see Data API operators.

Perform a vector search with vectorize:

# Perform a similarity search
curl -sS --location -X POST "$ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/pass:q[**COLLECTION_NAME**]" \
--header "Token: $ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "sort": {"$vectorize": "Talking shoes"},
    "projection": {"$vectorize": 1},
    "options": {
      "includeSimilarity": true,
      "limit": 10
    }
  }
}' | jq

Perform a vector search with vectorize and metadata filters:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "$and": [
        { "customer.credit_score": { "$gte": 700 } },
        { "customer.credit_score": { "$lt": 800 } }
      ]
    }
    "sort": { "$vectorize": 'green car' },
    "options": {
      "limit": 100
    }
  }
}' | jq

For vector ANN search, the response is a single page of up to 1000 documents, unless you set a lower limit.

You can use a projection to include specific document properties in the response. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com