Find and rerank documents

Hybrid search, lexical search, and reranking are currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms.

Finds documents in a collection through a retrieval process that uses a reranker model to combine results from a vector search and a lexical search. This process is called hybrid search. For more information about hybrid search mechanics and best practices, see Find data with hybrid search.

For other ways to find documents, including standalone vector search and exact-value filters, see Find documents.

This method requires the following:

  • A Serverless (Vector) database in the AWS us-east-2 region.

  • A collection with vector, lexical, and rerank enabled. For more information, see Create a collection that supports hybrid search.

  • Documents with the $lexical and $vector fields populated. Documents without both of these fields are excluded from hybrid search.

Method signature

  • Python

  • TypeScript

  • Java

  • curl

The following method belongs to the astrapy.Collection class.

find_and_rerank(
  filter: Dict[str, Any],
  *,
  sort: Dict[str, Any],
  projection: Dict[str, bool],
  document_type: type,
  limit: int,
  hybrid_limits: int | dict[str, int],
  include_scores: bool,
  include_sort_vector: bool,
  rerank_on: str,
  rerank_query: str,
  request_timeout_ms: int,
  timeout_ms: int,
) -> CollectionFindAndRerankCursor

The following method belongs to the Collection class.

findAndRerank(
  filter: CollectionFilter<Schema>,
  options?: {
    sort?: HybridSort,
    projection?: Projection,
    limit?: number,
    hybridLimits?: number | Record<string, number>
    rerankOn?: string,
    rerankQuery?: string,
    includeScores?: boolean,
    includeSortVector?: boolean,
    timeout?: number | TimeoutDescriptor,
  },
): CollectionFindAndRerankCursor<Schema, Schema>

The following methods belong to the com.datastax.astra.client.Collection class.

CollectionFindAndRerankCursor<T, R> findAndRerank(
    Filter filter,
    CollectionFindAndRerankOptions options,
    Class<R> newRowType
);
CollectionFindAndRerankCursor<T, T> findAndRerank(
    Filter filter,
    CollectionFindAndRerankOptions options
);
CollectionFindAndRerankCursor<T,T> findAndRerank(
    CollectionFindAndRerankOptions options
);

Omitting filter defaults to an empty filter. Omitting newRowType defaults to typing the returned documents as T, which is the same type as the collection.

The method signature depends on whether you query through the $vector or $vectorize field.

  • With $vectorize

  • Without $vectorize

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
    "findAndRerank": {
        "filter": FILTER,
        "options": {
            "hybridLimits": HYBRID_LIMITS,
            "includeScores": BOOLEAN,
            "includeSortVector": BOOLEAN,
            "limit": INTEGER,
            "rerankOn": STRING
        },
        "projection": PROJECTION,
        "sort": SORT
    }
}'
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
    "findAndRerank": {
        "filter": FILTER,
        "options": {
            "hybridLimits": HYBRID_LIMITS,
            "includeScores": BOOLEAN,
            "includeSortVector": BOOLEAN,
            "limit": INTEGER,
            "rerankOn": STRING,
            "rerankQuery": STRING
        },
        "projection": PROJECTION,
        "sort": SORT
    }
}'

Result

  • Python

  • TypeScript

  • Java

  • curl

Returns a cursor (CollectionFindAndRerankCursor) for iterating over the documents returned by the reranker.

Iterating over the cursor yields RerankedResult objects, which represent the returned documents. The fields included in the returned documents depend on the subset of fields that were requested in the projection.

Each RerankedResult object also includes a dictionary of the scores from the retrieval process. If scores were not requested, the dictionary is empty.

If requested, the result will also include the sort vector used for the underlying vector search. Calling .get_sort_vector() on the cursor reads the sort vector.

Cursors are lazy iterators, meant to be consumed with for loops or equivalent constructs. You must iterate over the cursor to fetch matching documents and their scores. If you need a list of all results, you can call the to_list method on the cursor.

For more information about the operations available on cursors, see FindCursor.

Returns a cursor (CollectionFindCursor<Schema, Schema>) for iterating over the documents returned by the reranker.

Iterating over the cursor yields RerankedResult<TRaw> objects, which represent the returned documents. The fields included in the returned documents depend on the subset of fields that were requested in the projection.

Each RerankedResult object also includes a dictionary of the scores from the retrieval process. If scores were not requested, the dictionary is empty.

If requested, the result will also include the sort vector used for the underlying vector search. Calling .getSortVector() on the cursor reads the sort vector.

Cursors are lazy iterators, meant to be consumed with for await loops or equivalent constructs. You must iterate over the cursor to fetch matching documents and their scores. If you need a list of all results, you can call the toArray() method on the cursor.

For more information about the operations available on cursors, see FindCursor.

Returns a cursor (CollectionFindAndRerankCursor) for iterating over the documents returned by the reranker.

Iterating over the cursor yields RerankedResult objects, which represent the returned documents.

The fields included in the returned documents depend on the subset of fields that were requested in the projection.

Each RerankedResult object also includes a map of the scores from the retrieval process. If scores were not requested, the map is empty.

If requested, the result will also include the sort vector used for the underlying vector search. Calling .getSortVector() on the cursor reads the sort vector.

Cursors are lazy iterators, meant to be consumed with for loops or equivalent constructs. You must iterate over the cursor to fetch matching documents and their scores. If you need a list of all results, you can call the .toList() method on the cursor.

For more information about the methods available on cursors, see AbstractCursor.

The response includes a data.documents property, which is an array of objects representing the documents returned by the reranker. The fields included in the returned documents depend on the subset of fields that were requested in the projection.

If requested, the response also includes a status.documentResponses property, which is a list of the scores from the retrieval process for each document.

If requested, the response also includes a status.sortVector property, which is the sort vector used for the underlying vector search.

This command always returns a single page of results, so data.nextPageState in the response is always null.

Example response without requesting sort vector or scores:

{
    "data": {
        "documents": [
            {
                "$lexical": "the house on the hill",
                "_id": "doc_a",
                "content": "the house on the hill",
                "tag": "x"
            },
            {
                "$lexical": "the tree in the woods",
                "_id": "doc_b",
                "content": "the tree in the woods",
                "tag": "x"
            }
        ],
        "nextPageState": null
    }
}

Example response with sort vector and scores requested:

{
    "data": {
        "documents": [
            {
                "$lexical": "the house on the hill",
                "_id": "doc_a",
                "content": "the house on the hill",
                "tag": "x"
            },
            {
                "$lexical": "the tree in the woods",
                "_id": "doc_b",
                "content": "the tree in the woods",
                "tag": "x"
            }
        ],
        "nextPageState": null
    },
    "status": {
        "documentResponses": [
            {
                "scores": {
                    "$rerank": -9.1015625,
                    "$vector": 0.96291006
                }
            },
            {
                "scores": {
                    "$rerank": -12.515625,
                    "$vector": 0.19139329
                }
            }
        ],
        "sortVector": [1.0, 1.0, 1.0]
    }
}

Example response if no documents were found, with sort vector and scores requested:

{
    "data": {
        "documents": [],
        "nextPageState": null
    },
    "status": {
        "documentResponses": [],
        "sortVector": [1.0, 1.0, 1.0]
    }
}

Parameters

  • Python

  • TypeScript

  • Java

  • curl

Name Type Summary

filter

Dict[str, Any]

Optional. An object that defines filter criteria using the Data API filter syntax. The method only finds documents that match the filter criteria.

For a list of available filter operators and more examples, see Filter operators for collections.

Filters can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in a filter.

Default: No filter, meaning any document is a possible match.

sort

dict[str, Any]

Specifies queries for the underlying vector and lexical searches.

For a collection without vectorize, pass query items for $vector and $lexical:

  • sort={"$hybrid": {"$vector": [0.1, -0.2, 0.5], "$lexical": "A house on a hill"}}

  • sort={"$hybrid": {"$vector": DataAPIVector([0.1, -0.2, 0.5]), "$lexical": "A house on a hill"}}

If your collection has vectorize enabled, you can query through the $vectorize field instead of the $vector field. You can also use a single search string for both the $vectorize and $lexical queries.

  • sort={"$hybrid": {"$vectorize": "A tree in the woods", "$lexical": "A house on a hill"}}

  • sort={"$hybrid": "A tree in the woods"}

If you query $vector, you must specify rerank_query and rerank_on.

projection

Dict[str, bool]

Optional. Controls which fields are included or excluded in the returned document.

For more information, see Projections for collections.

Default: The default projection for the collection. Certain fields, like $vector and $vectorize, are excluded by default. Certain fields, like _id, are included by default.

document_type

type

Optional. A specifier for the type checker.

You may want to use this parameter if your code is strictly typed, especially if you specify a projection.

For more information, see Typing support.

Default: dict[str, Any]. The returned cursor is implicitly a CollectionFindAndRerankCursor[DOC, RerankedResult[DOC]], meaning that it maintains the same type for the returned documents as that of the documents in the collection.

limit

int

Optional. Limit the total number of documents returned. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

Default: The limit set by the Data API.

hybrid_limits

int | dict[str, int]

Optional. Limit the number of documents returned by the underlying vector and lexical searches.

If a single number is specified, it applies to both the vector and lexical searches.

To set different limits for the vector and lexical searches, specify a dictionary in the form {"$vector": INTEGER, "$lexical": INTEGER}.

Default: The value of limit.

include_scores

bool

Optional. Whether to include the scores from the reranking process in the response.

These scores can be inspected in the scores attribute of each RerankedResult object yielded by the CollectionFindAndRerankCursor. This attribute is a free-form dictionary such as {"$vector": 0.81, "$rerank": 0.12}. See the examples for usage.

If false, the scores attribute of each RerankedResult object is an empty dictionary.

Default: False

include_sort_vector

bool

Optional. Whether to include the sort vector that was used for the underlying vector search in the response.

This can be useful if you query through the $vectorize field instead of the $vector field, since you don’t know the sort vector in advance.

The sort vector can be read by calling the .get_sort_vector() method on the returned cursor.

Default: False

rerank_on

str

Required if you use $vector in sort; otherwise optional.

The document field to use for the reranking step. Once the underlying vector and lexical searches complete, the reranker compares the rerank_query text with each document’s rerank_on field.

The reserved $lexical field is often used for this parameter, but you can specify any field that stores a string.

Any document lacking the field is excluded.

Default unless you use $vector in sort: "$lexical".

rerank_query

str

Required if you use $vector in sort; otherwise optional.

Query text for the reranker step.

Once the underlying vector and lexical searches complete, the reranker compares the rerank_query text with each document’s rerank_on field.

Default unless you use $vector in sort: the query used for the underlying vector search, which is specified by the sort parameter.

request_timeout_ms

int

Optional. The maximum time, in milliseconds, that the client should wait for each underlying HTTP request.

Default: The default value for the collection. This default is 10 seconds unless you specified a different default when you initialized the Collection or DataAPIClient object. For more information, see Timeout options.

timeout_ms

int

Optional. An alias for request_timeout_ms.

Name Type Summary

filter

CollectionFilter<Schema>

An object that defines filter criteria using the Data API filter syntax. The method only finds documents that match the filter criteria.

For a list of available filter operators and more examples, see Filter operators for collections.

Filters can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in a filter.

options

CollectionFindAndRerankOptions

Optional. The options for this operation. See the options table for more details.

Most of the options may be provided as either values in the options object, or as builder methods on the cursor itself (cursor.sort(…​)).

Properties of options
Name Type Summary

sort

HybridSort

Specifies queries for the underlying vector and lexical searches.

For a collection without vectorize, pass query items for $vector and $lexical:

  • { $hybrid: { $vector": [0.1, -0.2, 0.5], $lexical: "A house on a hill" } }

  • { $hybrid: { $vector": new DataAPIVector([0.1, -0.2, 0.5]), $lexical: "A house on a hill" } }

If your collection has vectorize enabled, you can query through the $vectorize field instead of the $vector field. You can also use a single search string for both the $vectorize and $lexical queries.

  • { $hybrid: { $vectorize: "A tree in the woods", $lexical: "A house on a hill" } }

  • { $hybrid : "A tree in the woods" }

If you query $vector, you must specify rerankQuery and rerankOn.

projection

Projection

Optional. Controls which fields are included or excluded in the returned document.

For more information, see Projections for collections.

Default: The default projection for the collection. Certain fields, like $vector and $vectorize, are excluded by default. Certain fields, like _id, are included by default.

limit

number

Optional. Limit the total number of documents returned. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

Default: The limit set by the Data API.

hybridLimits

number | Record<string, number>

Optional. Limit the number of documents returned by the underlying vector and lexical searches.

If a single number is specified, it applies to both the vector and lexical searches.

To set different limits for the vector and lexical searches, specify an object in the form {$vector: INTEGER, $lexical: INTEGER}.

Default: The value of limit.

includeScores

boolean

Optional. Whether to include the scores from the reranking process in the response.

These scores can be inspected in the scores attribute of each RerankedResult<TRaw> object yielded by the CollectionFindAndRerankCursor. This attribute is a free-form object such as { $vector: 0.81, $rerank: 0.12 }. See the examples for usage.

If false, the scores attribute of each RerankedResult object is an empty object.

Default: False

includeSortVector

boolean

Optional. Whether to include the sort vector that was used for the underlying vector search in the response.

This can be useful if you query through the $vectorize field instead of the $vector field, since you don’t know the sort vector in advance.

The sort vector can be read by calling the .getSortVector() method on the returned cursor.

Default: False

rerankOn

string

Required if you use $vector in sort; otherwise optional.

The document field to use for the reranking step. Once the underlying vector and lexical searches complete, the reranker compares the rerankQuery text with each document’s rerankOn field.

The reserved $lexical field is often used for this parameter, but you can specify any field that stores a string.

Any document lacking the field is excluded.

Default unless you use $vector in sort: "$lexical".

rerankQuery

string

Required if you use $vector in sort; otherwise optional.

Query text for the reranker step.

Once the underlying vector and lexical searches complete, the reranker compares the rerankQuery text with each document’s rerankOn field.

Default unless you use $vector in sort: the query used for the underlying vector search, which is specified by the sort parameter.

timeout

number | TimeoutDescriptor

Optional.

The timeout(s) to apply to this method. You can specify requestTimeoutMs and generalMethodTimeoutMs. Since this method issues a single HTTP request, these timeouts are equivalent.

Details about the timeout parameter

The TimeoutDescriptor object can contain these properties:

  • requestTimeoutMs (number): The maximum time, in milliseconds, that the client should wait for each underlying HTTP request. Default: The default value for the collection. This default is 10 seconds unless you specified a different default when you initialized the Collection or DataAPIClient object.

  • generalMethodTimeoutMs (number): The maximum time, in milliseconds, that the whole operation can take. Since this method issues a single HTTP request, generalMethodTimeoutMs and requestTimeoutMs are equivalent. If you specify both, the minimum of the two will be used. Default: The default value for the collection. This default is 30 seconds unless you specified a different default when you initialized the Collection or DataAPIClient object.

If you specify a number instead of a TimeoutDescriptor object, that number will be applied to both requestTimeoutMs and generalMethodTimeoutMs.

Name Type Summary

filter

Filter

An object that defines filter criteria using the Data API filter syntax. The method only finds documents that match the filter criteria.

Filters can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in a filter.

options

CollectionFindAndRerankOptions

Optional. The options for this operation. See the options table for more details.

Most of the options may be provided as either values in the options object, or as builder methods on the cursor itself (cursor.sort(…​)).

Properties of options
Name Type Summary

sort

Sort

Specifies queries for the underlying vector and lexical searches.

For a collection without vectorize, pass query items for $vector and $lexical:

Sort.hybrid(new Hybrid()
  .vector({0.1f, -0.2f, 0.5f})
  .lexical("A house on a hill"));

Sort.hybrid(new Hybrid()
  .vector(new DataAPIVector({0.1f, -0.2f, 0.5f}))
  .lexical("A house on a hill"));

If your collection has vectorize enabled, you can query through the $vectorize field instead of the $vector field. You can also use a single search string for both the $vectorize and $lexical queries.

Sort.hybrid(new Hybrid()
  .$vectorize("A tree in the woods")
  .lexical("A house on a hill"));

Sort.hybrid(new Hybrid("A tree in the woods")):

If you query $vector, you must specify rerankQuery and rerankOn.

projection()

Projection

Optional. Controls which fields are included or excluded in the returned document.

For more information, see Projections for collections.

Default: The default projection for the collection. Certain fields, like $vector and $vectorize, are excluded by default. Certain fields, like _id, are included by default.

limit

number

Optional. Limit the total number of documents returned. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

Default: The limit set by the Data API.

hybridLimits

Integer | Map<String, Integer>

Optional. Limit the number of documents returned by the underlying vector and lexical searches.

If a single number is specified, it applies to both the vector and lexical searches.

To set different limits for the vector and lexical searches, specify an object in the form {$vector: INTEGER, $lexical: INTEGER}.

Default: The value of limit.

includeScores

boolean

Optional. Whether to include the scores from the reranking process in the response.

These scores can be inspected in the scores attribute of each RerankedResult<R> object yielded by the CollectionFindAndRerankCursor. This attribute is a free-form object such as { $vector: 0.81, $rerank: 0.12 }.

If false, the scores attribute of each RerankedResult object is an empty object.

Default: False

includeSortVector

boolean

Optional. Whether to include the sort vector that was used for the underlying vector search in the response.

This can be useful if you query through the $vectorize field instead of the $vector field, since you don’t know the sort vector in advance.

The sort vector can be read by calling the .getSortVector() method on the returned cursor.

Default: False

rerankOn

string

Required if you use $vector in sort; otherwise optional.

The document field to use for the reranking step. Once the underlying vector and lexical searches complete, the reranker compares the rerankQuery text with each document’s rerankOn field.

The reserved $lexical field is often used for this parameter, but you can specify any field that stores a string.

Any document lacking the field is excluded.

Default unless you use $vector in sort: "$lexical".

rerankQuery

string

Required if you use $vector in sort; otherwise optional.

Query text for the reranker step.

Once the underlying vector and lexical searches complete, the reranker compares the rerankQuery text with each document’s rerankOn field.

Default unless you use $vector in sort: the query used for the underlying vector search, which is specified by the sort parameter.

Use the findAndRerank command with these parameters:

Name Type Summary

filter

object

Optional. An object that defines filter criteria using the Data API filter syntax. The method only finds documents that match the filter criteria.

For a list of available filter operators and more examples, see Filter operators for collections.

Filters can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in a filter.

Default: No filter, meaning any document is a possible match.

sort

object

Specifies queries for the underlying vector and lexical searches.

For a collection without vectorize, pass query items for $vector and $lexical:

  • sort={"$hybrid": {"$vector": [0.1, -0.2, 0.5], "$lexical": "A house on a hill"}}

If your collection has vectorize enabled, you can query through the $vectorize field instead of the $vector field. You can also use a single search string for both the $vectorize and $lexical queries.

  • sort={"$hybrid": {"$vectorize": "A tree in the woods", "$lexical": "A house on a hill"}}

  • sort={"$hybrid": "A tree in the woods"}

If you query $vector, you must specify rerankQuery and rerankOn.

projection

object

Optional. Controls which fields are included or excluded in the returned document.

For more information, see Projections for collections.

Default: The default projection for the collection. Certain fields, like $vector and $vectorize, are excluded by default. Certain fields, like _id, are included by default.

options

object

Optional. The options for this operation. See the options table for more details.

Properties of options:
Name Type Summary

limit

integer

Optional. Limit the total number of documents returned.

Default: The limit set by the Data API.

hybridLimits

integer | object

Optional. Limit the number of documents returned by the underlying vector and lexical searches.

If a single number is specified, it applies to both the vector and lexical searches.

To set different limits for the vector and lexical searches, specify a dictionary in the form {"$vector": INTEGER, "$lexical": INTEGER}.

Default: The value of limit.

includeScores

boolean

Optional. Whether to include the scores from the reranking process in the response.

These scores are returned in the status.documentResponses property of the response as a list of objects. The list is index matched to the list of returned documents. Each score is an object such as {"$vector": 0.81, "$rerank": 0.12}. See the examples for usage.

Default: False

includeSortVector

boolean

Optional. Whether to include the sort vector that was used for the underlying vector search in the response.

This can be useful if you query through the $vectorize field instead of the $vector field, since you don’t know the sort vector in advance.

The sort vector, if requested, is returned in the status.sortVector property of the response.

Default: False

rerankOn

string

Required if you use $vector in sort; otherwise optional.

The document field to use for the reranking step. Once the underlying vector and lexical searches complete, the reranker compares the rerankQuery text with each document’s rerankOn field.

The reserved $lexical field is often used for this parameter, but you can specify any field that stores a string.

Any document lacking the field is excluded.

Default unless you use $vector in sort: "$lexical".

rerankQuery

string

Required if you use $vector in sort; otherwise optional.

Query text for the reranker step.

Once the underlying vector and lexical searches complete, the reranker compares the rerankQuery text with each document’s rerankOn field.

Default unless you use $vector in sort: the query used for the underlying vector search, which is specified by the sort parameter.

Examples

The following examples demonstrate how to find documents with hybrid search.

  • Python

  • TypeScript

  • Java

  • curl

  • With $vectorize

  • Without $vectorize

Use the sort parameter to specify the queries for the underlying vector search and hybrid search.

If your collection has vectorize enabled, you can specify a search vector and query through the $vector field, as the "Without $vectorize" example demonstrates.

Alternatively, you can specify a search string and query through the $vectorize field. This search string is converted to a search vector for vector search. You can specify the same string for both the vector and lexical searches, or you can specify two separate strings.

Example with the same string for the vector and lexical searches:

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(sort={"$hybrid": "A tree on a hill"})

# Iterate over the found documents
for result in cursor:
    print(result.document)

Example with different strings for the vector and lexical searches:

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vectorize": "A tree on a hill",
            "$lexical": "A house in the woods"
        },
    },
)

# Iterate over the found documents
for result in cursor:
    print(result.document)

If your collection doesn’t have vectorize enabled, you must use the sort parameter to specify the search vector to use for vector search and the string to use for lexical search.

The search vector can be passed as a DataAPIVector object or as an array of floats.

You must also specify the rerank_query and rerank_on parameters.

from astrapy import DataAPIClient
from astrapy.data_types import DataAPIVector

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": DataAPIVector([0.3, 0.2, -0.1]),
            "$lexical": "A tree on a hill",
        },
    },
    rerank_query="A house in the woods",
    rerank_on="$lexical",
)

# Iterate over the found documents
for result in cursor:
    print(result.document)
  • With $vectorize

  • Without $vectorize

Use the sort parameter to specify the queries for the underlying vector search and hybrid search.

If your collection has vectorize enabled, you can specify a search vector and query through the $vector field, as the "Without $vectorize" example demonstrates.

Alternatively, you can specify a search string and query through the $vectorize field. This search string is converted to a search vector for vector search. You can specify the same string for both the vector and lexical searches, or you can specify two separate strings.

Example with the same string for the vector and lexical searches:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: { $hybrid: 'A tree on a hill' },
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();

Example with different strings for the vector and lexical searches:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: {
      $hybrid: { $vectorize: 'A tree on a hill', $lexical: 'A house in the woods' },
    },
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();

If your collection doesn’t have vectorize enabled, you must use the sort parameter to specify the search vector to use for vector search and the string to use for lexical search.

The search vector can be passed as a DataAPIVector object or as an array of floats.

You must also specify the rerankOn and rerankQuery options.

import { DataAPIClient, DataAPIVector } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  const cursor = await collection.findAndRerank({}, {
    sort: {
      $hybrid: { $vector: new DataAPIVector([-3., .2, -.1]), $lexical: 'A tree on a hill' },
    },
    rerankQuery: 'A house in the woods',
    rerankOn: '$lexical',
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
  • With $vectorize

  • Without $vectorize

Use the sort parameter to specify the queries for the underlying vector search and hybrid search.

If your collection has vectorize enabled, you can specify a search vector and query through the $vector field, as the "Without $vectorize" example demonstrates.

Alternatively, you can specify a search string and query through the $vectorize field. This search string is converted to a search vector for vector search. You can specify the same string for both the vector and lexical searches, or you can specify two separate strings.

Example with the same string for the vector and lexical searches:

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.cursor.CollectionFindAndRerankCursor;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankResult;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions().sort(Sort.hybrid("A tree on a hill")));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}

Example with different strings for the vector and lexical searches:

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.cursor.CollectionFindAndRerankCursor;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankResult;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Hybrid hybrid = new Hybrid()
            .vectorize("A house in the woods")
            .lexical("A tree on a hill");
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions().sort(Sort.hybrid(hybrid)));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}

If your collection doesn’t have vectorize enabled, you must use the sort parameter to specify the search vector to use for vector search and the string to use for lexical search.

The search vector can be passed as a DataAPIVector object or as an array of floats.

You must also specify the rerankOn and rerankQuery options.

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.cursor.CollectionFindAndRerankCursor;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankResult;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Hybrid hybrid = new Hybrid()
            .vector(new float[]{0.25f, 0.25f, 0.25f, 0.25f, 0.25f})
            .lexical("A tree on a hill");
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid(hybrid))
                    .rerankOn("$lexical")
                    .rerankQuery("A house in the woods"));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
  • With $vectorize

  • Without $vectorize

Use the sort parameter to specify the queries for the underlying vector search and hybrid search.

If your collection has vectorize enabled, you can specify a search vector and query through the $vector field, as the "Without $vectorize" example demonstrates.

Alternatively, you can specify a search string and query through the $vectorize field. This search string is converted to a search vector for vector search. You can specify the same string for both the vector and lexical searches, or you can specify two separate strings.

Example with the same string for the vector and lexical searches:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "sort": {
      "$hybrid": "A tree in the woods"
    }
  }
}'

Example with different strings for the vector and lexical searches:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "sort": {
      "$hybrid": {
        "$lexical": "A tree on a hill",
        "$vectorize": "A house in the woods"
      }
    }
  }
}'

If your collection doesn’t have vectorize enabled, you must use the sort parameter to specify the search vector to use for vector search and the string to use for lexical search.

The search vector can be passed as a DataAPIVector object or as an array of floats.

You must also specify the rerankOn and rerankQuery options.

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "rerankOn": "$lexical",
      "rerankQuery": "A house in the woods"
    },
    "sort": {
      "$hybrid": {
        "$lexical": "A tree on a hill",
        "$vector": [0.3, 0.2, -0.1]
      }
    }
  }
}'

Use a different query in the reranking step

The results of the underlying vector search and lexical search are run through a reranker model. The reranker uses a search string to rerank the documents that were returned by the underlying searches.

If you query through the $vector field, you must specify the search string for the reranker to use and the field to rerank the documents on.

If you query through the $vectorize field, the reranker will use the string that was used to perform the underlying vector search unless you specify a different string. It will also rerank documents on their $lexical field, unless you specify a different field.

  • Python

  • TypeScript

  • Java

  • curl

Use the rerank_query parameter to specify the search string for the reranker. Use the rerank_on parameter to specify which field to rerank the documents by.

  • With $vectorize

  • Without $vectorize

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={"$hybrid": "A tree in the woods"},
    rerank_query="A house on a hill",
)

# Iterate over the found documents
for result in cursor:
    print(result.document)
from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": [0.3, 0.2, -0.1],
            "$lexical": "A tree in the woods",
        },
    },
    rerank_query="A house on a hill",
    rerank_on="$lexical",
)

# Iterate over the found documents
for result in cursor:
    print(result.document)

Use the rerankQuery parameter to specify the search string for the reranker. Use the rerankOn parameter to specify which field to rerank the documents by.

  • With $vectorize

  • Without $vectorize

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: { $hybrid: 'A tree on a hill' },
    rerankQuery: 'A house on a hill',
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
import { DataAPIClient, DataAPIVector } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: {
      $hybrid: { $vector: new DataAPIVector([-3., .2, -.1]), $lexical: 'A tree on a hill' },
    },
    rerankQuery: 'A house on a hill',
    rerankOn: '$lexical',
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();

Use the rerankQuery parameter to specify the search string for the reranker. Use the rerankOn parameter to specify which field to rerank the documents by.

  • With $vectorize

  • Without $vectorize

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import com.datastax.astra.client.core.vector.DataAPIVector;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid("A tree on a hill"))
                    .rerankQuery("A house in the woods")
            );

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import com.datastax.astra.client.core.vector.DataAPIVector;

public class FindAndRerankExample {

    public static void main(String[] args) {

        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Hybrid hybrid = new Hybrid()
            .vector(new DataApiVector(new float[]{0.25f, 0.25f, 0.25f, 0.25f, 0.25f}))
            .lexical("A tree on a hill");
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid(hybrid))
                    .rerankOn("$lexical")
                    .rerankQuery("A house in the woods"));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}

Use the options.rerankOuery parameter to specify the search string for the reranker. Use the options.rerankOn parameter to specify which field to rerank the documents by.

  • With $vectorize

  • Without $vectorize

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "rerankQuery": "A house on a hill"
    },
    "sort": {
      "$hybrid": "A tree in the woods"
    }
  }
}'
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "rerankOn": "$lexical",
      "rerankQuery": "A house on a hill"
    },
    "sort": {
      "$hybrid": {
        "$lexical": "A tree in the woods",
        "$vector": [0.3, 0.2, -0.1]
      }
    }
  }
}'

You can use a filter to find documents that match specific criteria. For example, you can find documents with an isCheckedOut value of false and a numberOfPages value less than 300.

Only documents that match the filter will be included in the hybrid search.

For a list of available filter operators and more examples, see Filter operators for collections.

Filters can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in a filter.

  • Python

  • TypeScript

  • Java

  • curl

filter is the only positional parameter for this method. It can be passed without naming it before any other parameter.

  • With $vectorize

  • Without $vectorize

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    {
        "$and": [
            {"isCheckedOut": False},
            {"numberOfPages": {"$lt": 300}},
        ]
    },
    sort={"$hybrid": "A tree in the woods"},
)

# Iterate over the found documents
for result in cursor:
    print(result.document)
from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    {
        "$and": [
            {"isCheckedOut": False},
            {"numberOfPages": {"$lt": 300}},
        ]
    },
    sort={
        "$hybrid": {
            "$vector": [0.3, 0.2, -0.1],
            "$lexical": "A house on a hill",
        },
    },
    rerank_query="A tree in the woods",
    rerank_on="$lexical",
)

# Iterate over the found documents
for result in cursor:
    print(result.document)

filter is the only positional parameter for this method. Pass it directly as the first argument.

  • With $vectorize

  • Without $vectorize

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({
    $and: [
      { isCheckedOut: false },
      { numberOfPages: { $lt: 300 } },
    ],
  }, {
    sort: { $hybrid: 'query text' },
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
import { DataAPIClient, DataAPIVector } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
    const cursor = await collection.findAndRerank({
      $and: [
        { isCheckedOut: false },
        { numberOfPages: { $lt: 300 } },
      ],
    }, {
      sort: {
        $hybrid: { $vector: new DataAPIVector([-3., .2, -.1]), $lexical: 'A tree on a hill' },
      },
      rerankQuery: 'A house on a hill',
      rerankOn: '$lexical',
    });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
  • With $vectorize

  • Without $vectorize

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import com.datastax.astra.client.core.vector.DataAPIVector;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Filter filter = Filters.and(
            Filters.eq("isCheckedOut", false),
            Filters.lt("numberOfPages", 300));
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(filter,
                new CollectionFindAndRerankOptions().sort(Sort.hybrid("A tree on a hill")));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import com.datastax.astra.client.core.vector.DataAPIVector;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");


        // Find documents
        Filter filter = Filters.and(
            Filters.eq("isCheckedOut", false),
            Filters.lt("numberOfPages", 300));
        Hybrid hybrid = new Hybrid()
          .vector(new float[]{0.25f, 0.25f, 0.25f, 0.25f, 0.25f})
          .lexical("lexical query");
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                filter,
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid(hybrid))
                    .rerankOn("$lexical")
                    .rerankQuery("A house in the woods"));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
  • With $vectorize

  • Without $vectorize

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "filter": {
      "$and": [
        {"isCheckedOut": false},
        {"numberOfPages": {"$lt": 300}}
      ]
    },
    "sort": {
      "$hybrid": "A tree in the woods"
    }
  }
}'
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "filter": {
      "$and": [
        {"isCheckedOut": false},
        {"numberOfPages": {"$lt": 300}}
      ]
    },
    "options": {
      "rerankOn": "$lexical",
      "rerankQuery": "A tree in the woods"
    },
    "sort": {
      "$hybrid": {
        "$lexical": "A house on a hill",
        "$vector": [0.3, 0.2, -0.1]
      }
    }
  }
}'

Limit the number of documents returned

Specify a limit to only fetch up to a certain number of documents.

  • Python

  • TypeScript

  • Java

  • curl

  • With $vectorize

  • Without $vectorize

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={"$hybrid": "A tree in the woods"},
    limit=2,
)

# Iterate over the found documents
for result in cursor:
    print(result.document)
from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": [0.3, 0.2, -0.1],
            "$lexical": "A house on a hill",
        },
    },
    rerank_query="A tree in the woods",
    rerank_on="$lexical",
    limit=2,
)

# Iterate over the found documents
for result in cursor:
    print(result.document)
  • With $vectorize

  • Without $vectorize

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: { $hybrid: 'query text' },
    limit: 2,
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
import { DataAPIClient, DataAPIVector } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: {
      $hybrid: { $vector: new DataAPIVector([-3., .2, -.1]), $lexical: 'A tree on a hill' },
    },
    rerankQuery: 'A house on a hill',
    rerankOn: '$lexical',
    limit: 2,
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
  • With $vectorize

  • Without $vectorize

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import com.datastax.astra.client.core.vector.DataAPIVector;

public class FindAndRerankExample {
    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid("A tree on a hill"))
                    .limit(2));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import com.datastax.astra.client.core.vector.DataAPIVector;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Hybrid hybrid = new Hybrid()
            .vector(new float[]{0.25f, 0.25f, 0.25f, 0.25f, 0.25f})
            .lexical("A tree on a hill");
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid(hybrid))
                    .limit(2)
                    .rerankOn("$lexical")
                    .rerankQuery("A house in the woods"));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
  • With $vectorize

  • Without $vectorize

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "limit": 2
    },
    "sort": {
      "$hybrid": "A tree in the woods"
    }
  }
}'
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "limit": 2,
      "rerankOn": "$lexical",
      "rerankQuery": "A tree in the woods"
    },
    "sort": {
      "$hybrid": {
        "$lexical": "A house on a hill",
        "$vector": [0.3, 0.2, -0.1]
      }
    }
  }
}'

Limit the number of documents returned by the underlying searches

You can customize the number of documents returned by the underlying vector and lexical searches.

You can provide a single number, which is then used for both the vector search and the lexical search. Or, you can specify a different limit for each search. Specifying different limits can help boost the importance of one type of search over the other.

By default, each underlying search uses the same limit as the overall method.

  • Python

  • TypeScript

  • Java

  • curl

  • With $vectorize

  • Without $vectorize

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={"$hybrid": "A tree in the woods"},
    hybrid_limits={"$vector": 8, "$lexical": 20},
)

# Iterate over the found documents
for result in cursor:
    print(result.document)
from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": [0.3, 0.2, -0.1],
            "$lexical": "A house on a hill",
        },
    },
    hybrid_limits={"$vector": 8, "$lexical": 20},
    rerank_query="A tree in the woods",
    rerank_on="$lexical",
)

# Iterate over the found documents
for result in cursor:
    print(result.document)
  • With $vectorize

  • Without $vectorize

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: { $hybrid: 'query text' },
    hybridLimits: { $vector: 8, $lexical: 20 },
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
import { DataAPIClient, DataAPIVector } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: {
      $hybrid: { $vector: new DataAPIVector([-3., .2, -.1]), $lexical: 'A tree on a hill' },
    },
    rerankQuery: 'A house on a hill',
    rerankOn: '$lexical',
    hybridLimits: { $vector: 8, $lexical: 20 },
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
  • With $vectorize

  • Without $vectorize

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import java.util.Map;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                .sort(Sort.hybrid("A tree on a hill"))
                .hybridLimits(Map.of("$vector", 8, "$lexical", 20)));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import java.util.Map;

public class FindAndRerankExample {

    public static void main(String[] args) {

        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Hybrid hybrid = new Hybrid()
            .vector(new DataApiVector(new float[]{0.25f, 0.25f, 0.25f, 0.25f, 0.25f}))
            .lexical("A tree on a hill");
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid(hybrid))
                    .hybridLimits(Map.of("$vector", 8, "$lexical", 20))
                    .rerankOn("$lexical")
                    .rerankQuery("A house in the woods"));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
  • With $vectorize

  • Without $vectorize

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "hybridLimits": {
        "$lexical": 20,
        "$vector": 8
      }
    },
    "sort": {
      "$hybrid": "A tree in the woods"
    }
  }
}'
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "hybridLimits": {
        "$lexical": 20,
        "$vector": 8
      },
      "rerankOn": "$lexical",
      "rerankQuery": "A house on a hill"
    },
    "sort": {
      "$hybrid": {
        "$lexical": "A house on a hill",
        "$vector": [0.3, 0.2, -0.1]
      }
    }
  }
}'

Include the scores in the response

You can request the scores to be returned alongside the documents.

The reranking retrieval process assigns scores to each document, such as vector similarity and reranker scores, and then compares those scores across all retrieved documents to determine the best overall results.

  • Python

  • TypeScript

  • Java

  • curl

Each object yielded by the returned cursor contains a scores attribute. This attribute is a dictionary associating score names to their score. For example: {"$vector": 0.81, "$rerank": 0.12}.

  • With $vectorize

  • Without $vectorize

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={"$hybrid": "A tree in the woods"},
    include_scores=True,
)

# Iterate over the found documents and scores
for result in cursor:
    print(result.document)
    # This prints something like:
    #   {'$rerank': 0.5413, '$vector': 0.6429, ...}
    print(result.scores)
from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": [0.3, 0.2, -0.1],
            "$lexical": "A house on a hill",
        },
    },
    include_scores=True,
    rerank_query="A tree in the woods",
    rerank_on="$lexical",
)

# Iterate over the found documents and scores
for result in cursor:
    print(result.document)
    # This prints something like:
    #   {'$rerank': 0.5413, '$vector': 0.6429, ...}
    print(result.scores)

Each object yielded by the returned cursor contains a scores attribute. This attribute is an object associating score names to their score. For example: {$vector: 0.81, $rerank: 0.12}.

  • With $vectorize

  • Without $vectorize

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: { $hybrid: 'query text' },
    includeScores: true,
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
    // This prints something like:
    // {'$rerank': 0.5413, '$vector': 0.6429, ...}
    console.log(result.scores);
  }
})();
import { DataAPIClient, DataAPIVector } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: {
      $hybrid: { $vector: new DataAPIVector([-3., .2, -.1]), $lexical: 'A tree on a hill' },
    },
    rerankQuery: 'A house on a hill',
    rerankOn: '$lexical',
    includeScores: true,
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
    // This prints something like:
    // {'$rerank': 0.5413, '$vector': 0.6429, ...}
    console.log(result.scores);
  }
})();

Each object yielded by the returned cursor contains a scores attribute. This attribute is an object associating score names to their score. For example: {$vector: 0.81, $rerank: 0.12}.

  • With $vectorize

  • Without $vectorize

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import java.util.Map;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                .sort(Sort.hybrid("A tree on a hill"))
                .includeScores(true));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          // This prints something like:
          // {'$rerank': 0.5413, '$vector': 0.6429, ...}
          System.out.println(result.getScore());
        }
    }
}
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import java.util.Map;

public class FindAndRerankExample {

    public static void main(String[] args) {

        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Hybrid hybrid = new Hybrid()
            .vector(new DataApiVector(new float[]{0.25f, 0.25f, 0.25f, 0.25f, 0.25f}))
            .lexical("A tree on a hill");
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid(hybrid))
                    .includeScores(true)
                    .rerankOn("$lexical")
                    .rerankQuery("A house in the woods"));

        // Iterate over the results
        for (RerankResult<Document> result : cursor) {
          // This prints something like:
          // {'$rerank': 0.5413, '$vector': 0.6429, ...}
          System.out.println(result.getScore());
        }
    }
}

When the includeScores option is true, the response includes a status.documentResponses property, which is a list of the scores from the retrieval process for each document.

  • With $vectorize

  • Without $vectorize

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "includeScores": true
    },
    "sort": {
      "$hybrid": "A tree in the woods"
    }
  }
}'
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "includeScores": true,
      "rerankOn": "$lexical",
      "rerankQuery": "A tree in the woods"
    },
    "sort": {
      "$hybrid": {
        "$lexical": "A house on a hill",
        "$vector": [0.3, 0.2, -0.1]
      }
    }
  }
}'

Example response:

{
  "data": {
    "documents": [
      {
        "_id": "doc_s",
        "content": "once upon a time",
        "tag": "x"
      },
      {
        "_id": "doc_m",
        "content": "ascomycota lack clamp connections",
        "tag": "x"
      }
    ],
    "nextPageState": null
  },
  "status": {
    "documentResponses": [
      {
        "scores": {
          "$rerank": 0.5413,
          "$vector": 0.6429
        }
      },
      {
        "scores": {
          "$rerank": 0.1871,
          "$vector": 0.7204
        }
      }
    ]
  }
}

Include the sort vector in the response

You can include the sort vector in the result. This can be useful if you use $vectorize and a search string in the sort parameter, since you don’t know the sort vector in advance.

  • Python

  • TypeScript

  • Java

  • curl

Calling .get_sort_vector() on the returned cursor reads the sort vector.

  • With $vectorize

  • Without $vectorize

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = farr_vectorize.find_and_rerank(
    sort={"$hybrid": "A tree in the woods"},
    include_sort_vector=True,
)

# Inspect the sort vector
print(cursor.get_sort_vector())

# Iterate over the found documents
for result in cursor:
    print(result.document)
from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = farr_vector.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": [0.3, 0.2, -0.1],
            "$lexical": "A house on a hill",
        },
    },
    include_sort_vector=True,
    rerank_query="A tree in the woods",
    rerank_on="$lexical",
)

# Inspect the sort vector
print(cursor.get_sort_vector())

# Iterate over the found documents
for result in cursor:
    print(result.document)

Calling .getSortVector() on the returned cursor reads the sort vector.

  • With $vectorize

  • Without $vectorize

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: { $hybrid: 'query text' },
    includeSortVector: true,
  });

  // Inspect the sort vector
  console.log(await cursor.getSortVector());

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();
import { DataAPIClient, DataAPIVector } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: {
      $hybrid: { $vector: new DataAPIVector([-3., .2, -.1]), $lexical: 'A tree on a hill' },
    },
    rerankQuery: 'A house on a hill',
    rerankOn: '$lexical',
    includeSortVector: true,
  });

  // Inspect the sort vector
  console.log(await cursor.getSortVector());

  // Iterate over the found documents
  for await (const result of cursor) {
    console.log(result.document);
  }
})();

Calling .getSortVector() on the returned cursor reads the sort vector.

  • With $vectorize

  • Without $vectorize

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import java.util.Map;

public class FindAndRerankExample {

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid("A tree on a hill"))
                    .includeSortVector(true));

        // Get the sort vector
        cursor.getSortVector().ifPresent(vector -> {
            System.out.println("Sort Vector: " + vector);
        });
    }
}
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;

public class FindAndRerankExample {

    public static void main(String[] args) {

        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Hybrid hybrid = new Hybrid()
            .vector(new DataApiVector(new float[]{0.25f, 0.25f, 0.25f, 0.25f, 0.25f}))
            .lexical("A tree on a hill");
        CollectionFindAndRerankCursor<Document, Document> cursor =
            collection.findAndRerank(
                new CollectionFindAndRerankOptions()
                    .sort(Sort.hybrid(hybrid))
                    .includeSortVector(true)
                    .rerankOn("$lexical")
                    .rerankQuery("A house in the woods"));

        cursor.getSortVector().ifPresent(vector -> {
          System.out.println("Sort Vector: " + vector);
        });
    }
}

If the includeSortVector option is true, the response includes a status.sortVector property, which is the sort vector used for the underlying vector search.

  • With $vectorize

  • Without $vectorize

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "includeSortVector": true
    },
    "sort": {
      "$hybrid": "A tree in the woods"
    }
  }
}'
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "includeSortVector": true,
      "rerankOn": "$lexical",
      "rerankQuery": "A tree in the woods"
    },
    "sort": {
      "$hybrid": {
        "$lexical": "A house on a hill",
        "$vector": [0.3, 0.2, -0.1]
      }
    }
  }
}'

Example response:

{
  "data": {
    "documents": [
      {
        "_id": "doc_s",
        "content": "once upon a time",
        "tag": "x"
      },
      {
        "_id": "doc_m",
        "content": "ascomycota lack clamp connections",
        "tag": "x"
      }
    ],
    "nextPageState": null
  },
  "status": {
    "sortVector": [0.3, 0.2, -0.1]
  }
}

Include only specific fields in the response

To specify which fields to include or exclude in the returned documents, use a projection.

Certain fields, like $vector and $vectorize, are excluded by default and will only be returned if you specify them in the projection. Certain fields, like _id, are included by default.

  • Python

  • TypeScript

  • Java

  • curl

  • With $vectorize

  • Without $vectorize

from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={"$hybrid": "A tree in the woods"},
    projection={"isCheckedOut": True, "title": True},
)

# Iterate over the found documents
for result in cursor:
    # Documents will only have the requested fields
    # (plus '_id' by default projection)
    print(result.document)
from astrapy import DataAPIClient

# Get an existing collection
client = DataAPIClient()
database = client.get_database(
  "ASTRA_DB_API_ENDPOINT",
  token="ASTRA_DB_APPLICATION_TOKEN",
)
collection = database.get_collection("COLLECTION_NAME")

# Find documents
cursor = collection.find_and_rerank(
    sort={
        "$hybrid": {
            "$vector": [0.3, 0.2, -0.1],
            "$lexical": "A house on a hill",
        },
    },
    projection={"isCheckedOut": True, "title": True},
    rerank_query="A tree in the woods",
    rerank_on="$lexical",
)

# Iterate over the found documents
for result in cursor:
    # Documents will only have the requested fields
    # (plus '_id' by default projection)
    print(result.document)

To ensure the correct type is used, provide an explicit type parameter when using a projection. Otherwise, the document type is inferred as Partial<TRaw>.

  • With $vectorize

  • Without $vectorize

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  type Projected = { isCheckedOut: boolean, title: string };

  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: { $hybrid: 'query text' },
    projection: { isCheckedOut: 1, title: 1 },
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    // Documents will only have the requested fields
    // (plus '_id' by default projection)
    console.log(result.document);
  }
})();
import { DataAPIClient, DataAPIVector } from '@datastax/astra-db-ts';

// Get an existing collection
const client = new DataAPIClient();
const db = client.db('ASTRA_DB_API_ENDPOINT', { token: 'ASTRA_DB_APPLICATION_TOKEN' });
const collection = db.collection('COLLECTION_NAME');

(async function () {
  type Projected = { isCheckedOut: boolean, title: string };

  // Find documents
  const cursor = await collection.findAndRerank({}, {
    sort: {
      $hybrid: { $vector: new DataAPIVector([-3., .2, -.1]), $lexical: 'A tree on a hill' },
    },
    projection: { isCheckedOut: 1, title: 1 },
    rerankQuery: 'A house on a hill',
    rerankOn: '$lexical',
  });

  // Iterate over the found documents
  for await (const result of cursor) {
    // Documents will only have the requested fields
    // (plus '_id' by default projection)
    console.log(result.document);
  }
})();

To ensure the correct type is used, provide an explicit parameter when using a projection.

  • With $vectorize

  • Without $vectorize

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.cursor.CollectionFindAndRerankCursor;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Projection;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import lombok.Data;

public class FindAndRerankExample {

  @Data
  public static class Projected {
    boolean isCheckedOut;
    String title;
  }

    public static void main(String[] args) {
        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Filter filter = null;
        CollectionFindAndRerankCursor<Document, Projected> cursor =
            collection.findAndRerank(
                filter,
                new CollectionFindAndRerankOptions()
                    .projection(Projection.include("isCheckedOut", "title"))
                    .sort(Sort.hybrid("A tree on a hill")),
                Projected.class
            );

        // You can also specify a projection at the cursor level
        cursor.project(Projection.include("title"));

        // Iterate over the results
        for (RerankResult<Projected> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.cursor.CollectionFindAndRerankCursor;
import com.datastax.astra.client.collections.commands.options.CollectionFindAndRerankOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.hybrid.Hybrid;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Projection;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.rerank.RerankedResult;
import lombok.Data;

public class FindAndRerankExample {

    @Data
    public static class Projected {
        boolean isCheckedOut;
        String title;
    }

    public static void main(String[] args) {

        // Get an existing collection
        Collection<Document> collection = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

        // Find documents
        Hybrid hybrid = new Hybrid()
            .vector(new DataApiVector(new float[]{0.25f, 0.25f, 0.25f, 0.25f, 0.25f}))
            .lexical("A tree on a hill");
        Filter filter = null;
        CollectionFindAndRerankCursor<Document, Projected> cursor =
            collection.findAndRerank(
                filter,
                new CollectionFindAndRerankOptions()
                    .projection(Projection.include("isCheckedOut", "title"))
                    .sort(Sort.hybrid(hybrid))
                    .rerankOn("$lexical")
                    .rerankQuery("A house in the woods"),
                Projected.class);

        // You can also specify a projection at the cursor level
        cursor.project(Projection.include("title"));

        // Iterate over the results
        for (RerankResult<Projected> result : cursor) {
          System.out.println(result.getDocument());
        }
    }
}
  • With $vectorize

  • Without $vectorize

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {},
    "projection": {
      "isCheckedOut": true,
      "title": true
    },
    "sort": {
      "$hybrid": "A tree in the woods"
    }
  }
}
'
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/COLLECTION_NAME" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findAndRerank": {
    "options": {
      "rerankOn": "$lexical",
      "rerankQuery": "A tree in the woods"
    },
    "projection": {
      "isCheckedOut": true,
      "title": true
    },
    "sort": {
      "$hybrid": {
        "$lexical": "A house on a hill",
        "$vector": [0.3, 0.2, -0.1]
      }
    }
  }
}'

Client reference

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the client reference.

For more information, see the client reference.

For more information, see the client reference.

Client reference documentation is not applicable for HTTP.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com