Perform a vector search

The Data Explorer in the Astra Portal lets you view your data, search for similar vectors, and filter by metadata.

Prerequisites

The steps on this page assume the following:

Search your data

You can use the Astra Portal or clients to perform a vector search.

Search your data with vectorize

If you’ve configured your collection to auto-generate embeddings using an embedding provider, you can perform a similarity search using text, rather than a vector:

  • Astra Portal

  • Python

  • TypeScript

  • Java

Use the Astra Portal to perform a search with vectorize:

  1. In the Astra Portal, go to Databases, and then select your Serverless (Vector) database.

  2. Click Data Explorer.

  3. Select the Namespace and Collection that contain the data you want to view.

    Your data is displayed in the Collection Data section. The field you configured to auto-generate embeddings is notated with ($vectorize) in the column title. The $vector field contains the generated embeddings.

  4. Enter a text query into the Hybrid Search field, and then click Apply.

    Astra DB auto-generates a vector from the text query and performs a similarity search. The search uses the similarity metric that you chose when you created the collection.

  5. Optional: Use Add Filter to filter your search results by the other fields in the collection. For more information about using filters, see Add a metadata filter.

The Collection Data section updates to show the rows that match your search criteria.

Use the Python client to perform a search with vectorize:

# Perform a similarity search
query = "I'd like some talking shoes"
results = collection.find(
    sort={"$vectorize": query},
    limit=2,
    projection={"$vectorize": True},
    include_similarity=True,
)
print(f"Vector search results for '{query}':")
for document in results:
    print("    ", document)

Use the TypeScript client to perform a search with vectorize:

  // Perform a similarity search
  const cursor = await collection.find({}, {
    sort: { $vectorize: 'shoes' },
    limit: 2,
    includeSimilarity: true,
  });

  console.log('* Search results:')
  for await (const doc of cursor) {
    console.log('  ', doc.text, doc.$similarity);
  }

Use the Java client to perform a search with vectorize:

// Perform a similarity search
FindOptions findOptions = new FindOptions()
       .limit(2)
       .includeSimilarity()
       .sort("I'd like some talking shoes");
FindIterable<Document> results = collection.find(findOptions);
for (Document document : results) {
   System.out.println("Document: " + document);
}

Add a metadata filter

You can filter your vector search results by the non-vector fields in the collection.

  • Astra Portal

  • Python

  • TypeScript

  • Java

Use the Astra Portal to add a filter:

  1. In the Hybrid Search section, click Add Filter.

    The displayed fields are:

    Key

    The name of the field to search on.

    Condition

    The type of filtering to perform. All conditions are case sensitive and must be an exact match.

    • is performs an exact match of a scalar or value within an array.

    • contains performs an exact match of a value within an array.

    See Data API operators for the full list of conditions available in the Data API.

    Value

    The value to filter by.

  2. Configure the filter. For example:

    Filter by is
    Filtering by *Key*: `character`, *Condition*: `is`, and *Value*: `Lassie` returns all documents with a `character` field set to a scalar value of `"Lassie"` or set to an array containing a value of `"Lassie"`.
    
    * This matches the following values: `"Lassie"`, `["Lassie", "Timmy"]`.
    
    * This does _not_ match the following values: `"lassie"`, `"Lassie Come Home"`, `["lassie", "Timmy"]`.
    Filter by contains
    Filtering by *Key*: `color`, *Condition*: `contains`, and *Value*: `red` returns all documents with a `color` field set to an array containing a value of `"red"`.
    
    * This matches the following values: `["red", "blue", "green"]`, `["red"]`.
    
    * This does _not_ match the following values: `"red"`, `["reddish", "Red", "Green", "Blue"]`, `["green", "blue"]`.
  3. Optional: Click Add Filter to add more filter criteria.

  4. Click Apply.

    You can also use the dropdown menu to the right of the filter to limit the number of results returned from the vector search.

The Collection Data section is updated with the rows that match your filters.

Use the Python client to perform a vector search with metadata filters:

# Perform a similarity search with metadata filters
query_vector = [0.15, 0.1, 0.1, 0.35, 0.55]
results = collection.find(
    {"$and": [
        {"price": {"$gte": 100}},
        {"name": "John"}
    ]},
    sort={"$vector": query_vector},
    limit=10,
    projection={"*": True},
)
print("Vector search results:")
for document in results:
    print("    ", document)

Use the TypeScript client to perform a vector search with metadata filters:

  // Perform a similarity search with metadata filters
  const cursor = await collection.find({
    $and: [
      { price: { $gte: 100 } },
      { name: 'John' }
    ]
  }, {
    sort: { $vector: [0.15, 0.1, 0.1, 0.35, 0.55] },
    limit: 10,
    includeSimilarity: true,
  });

  console.log('* Search results:')
  for await (const doc of cursor) {
    console.log('  ', doc);
  }

Use the Java client to perform a vector search with metadata filters:

// Perform a similarity search with metadata filters
FindIterable<Document> resultsSet = collection.find(
    Filters.and(
            Filters.gte("price", 100),
            Filters.eq("name", "John")
    ),
    new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
    10
);
resultsSet.forEach(System.out::println);

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com