Find data with hybrid search

Hybrid search, lexical search, and reranking are currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms.

Hybrid search, lexical search, and reranking are only available for collections in databases in the AWS us-east-2 region.

Hybrid search is an advanced search technique that can improve the overall quality of search results by selecting the most relevant results from a larger set of search results. It combines the contextual breadth of vector search, the specificity of metadata filters, and the precision of lexical tokenization.

While you can use vector search and metadata filters together, hybrid search can impart additional context to the search results as part of the reranking process.

The hybrid search process

Hybrid search compares the results of multiple searches to return a refined set of highly relevant search results. These results can be more contextually relevant than the top results of a standalone search.

Here’s how hybrid search works in Astra DB:

If provided, apply a metadata filter on the dataset.

This is an optional step that narrows the scope of the search to a specific subset of documents before performing the subsequent vector and lexical searches.

For example, a filter for productType: "boots" eliminates documents with a productType value of sandals or sneakers. As a result, the subsequent operations don’t waste time and resources on irrelevant documents.
Given a query vector, run a vector search on the dataset.

For example, if you run a vector search based on the phrase "waterproof hiking boots", the results could include documents that match this phrase as well as some similar but contextually irrelevant results, such as water-resistant boots or other types of boots.
Given a string, run a keyword-based lexical search on a tokenized index. This can be the same string as the vector search or a different string.

Lexical search breaks the string into keywords using spaces as the delimiter. Then, it finds documents containing all of the keywords in any order. To sort the results, it considers factors such as the frequency of the keywords relative to the overall length of a document.

For example, if the same phrase "waterproof hiking boots" is used, lexical search returns documents that contain all of the words "waterproof", "hiking", and "boots" in any order with any number of words between. A document containing "Consumers prefer boots that are waterproof and suitable for hiking" would be a match, as well as a document containing "These boots were made for hiking, but they aren’t waterproof".
Use a reranking model to compare the combined vector and lexical search results, and then return the top results.

Reranking doesn’t simply reorder the results of both searches. The reranking model incorporates additional factors into the final calculation, such as the relationship between pieces of content that can be lost when chunking.

The exact calculation depends the reranking model. Currently, DataStax supports only the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model.

Importance of chunking in hybrid search

Chunking is a common practice for information retrieval that breaks longer documents or blocks of text into smaller segments. It is an important factor in LLM use cases, including vector search and lexical search (BM25):

Lexical search typically considers the frequency of a keyword relative to the overall length of a document. Chunking can influence lexical search results if a keyword occurs many times in a particular chunk of an otherwise irrelevant result.
Vector search is influenced by chunking because vector embeddings are generated from chunks. Larger chunks can dilute important details, whereas smaller chunks can lose semantic meaning derived from the wider context of a sentence or paragraph.

Although lexical and vector search can be influenced by chunking, hybrid search combines and reranks the results of both searches, ultimately reducing the influence of chunking that would be present with either search alone.

Here are some additional strategies you can use to mitigate potential adverse effects of chunking on hybrid search results.

Use a consistent chunking method

Use the same chunking method for all documents in a collection to ensure consistency in chunk size and chunking parameters. If documents are chunked differently, the lexical and vector search results won’t reflect an accurate comparison.

For example, if a clothing retailer has a collection of product data, they might use chunking to break long product descriptions into smaller segments. To ensure the products are retrieved and scored accurately, all product descriptions must be chunked in the same way.

Astra DB doesn’t control chunking. You must implement chunking before you insert documents into a collection.

Support chunks with metadata

Much like vector embeddings for vector search, chunks are not a replacement for other data types. Metadata (non-chunked, non-vector values) highlight important data and provide additional context that can be lost in ambiguous chunks. Together, chunks and metadata create a complete picture of the original fully-formed document.

To continue the previous example, a collection containing product data could include chunked product descriptions along with metadata like the product name, product type, sizes, measurements, colors, and other specifications. Each chunk would be inserted as a separate document with the same set of metadata, and all documents with the same metadata are inherently related by that common metadata.

When you run a hybrid search, you can include additional metadata filters to ensure that your search results don’t miss context that you deem critical. In contrast, searching on chunks alone could produce results that are mathematically accurate but not truly relevant.

Use metadata filters to improve performance

In addition to the preceding information about metadata, adding a metadata filter to your hybrid search has two performance benefits:

Improve application performance (memory consumption and processing time) by reducing the total number of documents that must be retrieved, scored, and ranked.
Improve the relevance of the results by ensuring that the search is scoped to a specific subset of documents.

Run a hybrid search with the Data API

You can use the Data API to run hybrid searches on collections in Serverless (Vector) databases that are configured to use hybrid search:

Create a collection with vector, lexical, and rerank enabled.

You can only enable hybrid search when you create a collection because collection settings are immutable after creation.

If you want to use hybrid search on data in an existing collection, you must create a collection with hybrid search enabled, and then migrate your documents to the new collection.
When you insert documents, include the values required for hybrid search.

If you want to use hybrid search to retrieve a document, the document must have a vector embedding and a string for lexical tokenization. There are three ways to provide these values. For each document, you must do one of the following:
- Include the $vector and $lexical fields.
  
  $vector is the vector embedding for vector search, and $lexical is the string to tokenize for lexical search.
- Include the $vectorize and $lexical fields.
  
  Both fields are strings. $vectorize is converted to a vector embedding through a vectorize integration, and $lexical is tokenized for lexical search.
- Use the $hybrid shorthand to populate the $vectorize and $lexical fields.
  
  Use $hybrid if you want to use the same string for $vectorize and $lexical.
For information about inserting documents that support hybrid search, see Insert a document and Insert documents.
Use the find and rerank command to run a hybrid search.

For all options and examples, see the documentation for the find and rerank command.