Sort clauses for collections

Sort and filter clauses can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

Data API commands, such as find, findOne, deleteOne, and updateOne, can use sort clauses to organize results based on similarity, or dissimilarity, to the given filter, such as a vector or field.

Additionally, you can use a projection to include specific document properties in the response. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

  • Python

  • TypeScript

  • Java

  • curl

  • You can’t use the $vector and $vectorize sort clauses together.

  • Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API:

    • Vector searches can return no more than 1000 documents per search operation, regardless of the limit parameter.

    • When using an ascending or descending sort criterion, the Data API returns up to 20 documents at once. The returned documents are the top results across the whole collection based on the filter criteria.

      These provisions can also apply when running subsequent commands on cursors, such as .distinct().

      For ascending or descending sort clauses that do not automatically paginate, it is sometimes possible to use the limit and skip options to control the number of rows returned and the starting point of the results, as a form of manual pagination.

  • When you don’t specify sorting criteria (by vector or otherwise), the cursor can scroll through an arbitrary number of documents because the Data API and the client periodically exchange new chunks of documents.

    If documents are added or removed after starting a find operation, the cursor behavior depends on database internals. There is no guarantee as to whether or not the cursor will pick up such "real-time" changes in the data.

When no particular order is required:

sort={}  # (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

from astrapy.constants import SortDocuments

# Ascending sort
sort={"field": SortDocuments.ASCENDING}

# Descending sort
sort={"field": SortDocuments.DESCENDING}

Be aware of the order when chaining multiple sorts. For example, when sorting first by a specific field and then by a specific subfield:

sort={
    "field": SortDocuments.ASCENDING,
    "subfield": SortDocuments.ASCENDING,
}

While modern Python versions preserve the order of dictionaries, it is suggested for clarity to employ a collections.OrderedDict with chained sorts.

You can use sort to perform a vector search:

# Use the specified vector,
# And then sort by similarity to the given vector.
sort={"$vector": [0.4, 0.15, -0.5]}

# Generate a vector from a string,
# Run a vector search,
# And then sort by similarity to the given vector.
# Requires a valid vectorize integration.
sort={"$vectorize": "Text to vectorize"}
Sort example
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

filter = {"seq": {"$exists": True}}
for doc in collection.find(filter, projection={"seq": True}, limit=5):
    print(doc["seq"])
...
# will print e.g.:
#   37
#   35
#   10
#   36
#   27
cursor1 = collection.find(
    {},
    limit=4,
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
[doc["_id"] for doc in cursor1]
# prints: ['97e85f81-...', '1581efe4-...', '...', '...']
cursor2 = collection.find({}, limit=3)
cursor2.distinct("seq")
# prints: [37, 35, 10]
collection.insert_many([
    {"tag": "A", "$vector": [4, 5]},
    {"tag": "B", "$vector": [3, 4]},
    {"tag": "C", "$vector": [3, 2]},
    {"tag": "D", "$vector": [4, 1]},
    {"tag": "E", "$vector": [2, 5]},
])
ann_tags = [
    document["tag"]
    for document in collection.find(
        {},
        sort={"$vector": [3, 3]},
        limit=3,
    )
]
ann_tags
# prints: ['A', 'B', 'C']
# (assuming the collection has metric VectorMetric.COSINE)
  • You can’t use the $vector and $vectorize sort clauses together.

  • Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API:

    • Vector searches can return no more than 1000 documents per search operation, regardless of the limit parameter.

    • When using an ascending or descending sort criterion, the Data API returns up to 20 documents at once. The returned documents are the top results across the whole collection based on the filter criteria.

      These provisions can also apply when running subsequent commands on cursors, such as .distinct().

      For ascending or descending sort clauses that do not automatically paginate, it is sometimes possible to use the limit and skip options to control the number of rows returned and the starting point of the results, as a form of manual pagination.

  • When you don’t specify sorting criteria (by vector or otherwise), the cursor can scroll through an arbitrary number of documents because the Data API and the client periodically exchange new chunks of documents.

    If documents are added or removed after starting a find operation, the cursor behavior depends on database internals. There is no guarantee as to whether or not the cursor will pick up such "real-time" changes in the data.

Sort is very weakly typed by default. See StrictSort<Schema> for a stronger typed alternative that provides full autocomplete as well.

When no particular order is required:

{ sort: {} }  // (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

{ sort: { field: +1 } }  // ascending
{ sort: { field: -1 } }  // descending

Be aware of the order when chaining multiple sorts because ES2015+ guarantees string keys in order of insertion For example, when sorting first by a field and then by a specific subfield:

{ sort: { field: 1, subfield: 1 } }

You can use sort to perform a vector search:

// Use the specified vector,
// And then sort by similarity to the given vector.
{ sort: { $vector: [0.4, 0.15, -0.5] } }

// Generate a vector from a string,
// Run a vector search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration
{ sort: { $vectorize: "Text to vectorize" } }

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'Jane', age: 25, $vector: [1.0, 1.0, 1.0, 1.0, 1.0] },
    { name: 'Dave', age: 40, $vector: [0.4, 0.5, 0.6, 0.7, 0.8] },
    { name: 'Jack', age: 40, $vector: [0.1, 0.9, 0.0, 0.5, 0.7] },
  ]);

  // Sort by age ascending, then by name descending (Jane, Jack, Dave)
  const sorted1 = await collection.find({}, { sort: { age: 1, name: -1 } }).toArray();
  console.log(sorted1.map(d => d.name));

  // Sort by vector distance (Jane, Dave, Jack)
  const sorted2 = await collection.find({}, { sort: { $vector: [1, 1, 1, 1, 1] } }).toArray();
  console.log(sorted2.map(d => d.name));
})();
  • You can’t use the $vector and $vectorize sort clauses together.

  • Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API:

    • Vector searches can return no more than 1000 documents per search operation, regardless of the limit parameter.

    • When using an ascending or descending sort criterion, the Data API returns up to 20 documents at once. The returned documents are the top results across the whole collection based on the filter criteria.

      These provisions can also apply when running subsequent commands on cursors, such as .distinct().

      For ascending or descending sort clauses that do not automatically paginate, it is sometimes possible to use the limit and skip options to control the number of rows returned and the starting point of the results, as a form of manual pagination.

  • When you don’t specify sorting criteria (by vector or otherwise), the cursor can scroll through an arbitrary number of documents because the Data API and the client periodically exchange new chunks of documents.

    If documents are added or removed after starting a find operation, the cursor behavior depends on database internals. There is no guarantee as to whether or not the cursor will pick up such "real-time" changes in the data.

The sort() operations are optional. Use them only when needed.

Be aware of the order when chaining multiple sorts:

Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
FindOptions.Builder.sort(s1, s2);

You can use sort to perform a vector search:

// Use the specified vector,
// And then sort by similarity to the given vector.
FindOptions.Builder
 .sort(new float[] {0.4f, 0.15f, -0.5f});

// Generate a vector from a string,
// Run a vector search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration
FindOptions.Builder
 .sort("Text to vectorize");

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sort;
import com.datastax.astra.client.model.Sorts;

import static com.datastax.astra.client.model.Filters.lt;

public class WorkingWithSorts {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Sort Clause for a vector
        Sorts.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;

        // Sort Clause for other fields
        Sort s1 = Sorts.ascending("field1");
        Sort s2 = Sorts.descending("field2");

        // Build the sort clause
        new FindOptions().sort(s1, s2);

        // Adding vector
        new FindOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}, s1, s2);

    }
}
  • You can’t use the $vector and $vectorize sort clauses together.

  • Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API:

    • Vector searches can return no more than 1000 documents per search operation, regardless of the limit parameter.

    • If sort is ascending, descending, or unspecified, the Data API returns up to 20 documents at once. The returned documents are the top results across the whole collection based on the filter criteria. Pagination can occur if there are more than 20 matching documents, but, in some cases, the nextPageState is null regardless of the actual presence of additional results.

  • The search type and upper limit impact the response:

    • Vector search returns a single page of up to 1000 documents, unless you set a lower limit.

    • Searches without $vector or $vectorize return matching documents in batches of 20. Pagination occurs if there are more than 20 matching documents. For information about handling pagination, see Find documents.

  • If documents are added or removed after starting a find operation, paging behavior depends on database internals. There is no guarantee as to whether or not pagination will pick up such "real-time" changes in the data.

When you run a Find command, you can append nested JSON objects that define the search criteria (sort or filter), projection, and other options.

If no particular order is required, you can search with an empty filter:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {},
  }
}' | jq

This example finds documents by performing a vector search:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
    "projection": { "$vector": 1 },
    "options": {
      "includeSimilarity": true,
      "includeSortVector": false,
      "limit": 100
    }
  }
}' | jq

This request does the following:

  • sort compares the given vector, [0.15, 0.1, 0.1, 0.35, 0.55], against the vectors for documents in the collection, and then returns results ranked by similarity. The $vector key is a reserved property name for storing vector data.

  • projection requests that the response return the $vector for each document.

  • options.includeSimilarity requests that the response include the $similarity key with the numeric similarity score, which represents the closeness of the sort vector and the document’s vector.

  • options.includeSortVector is set to false to exclude the sortVector from the response. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This is particularly useful with $vectorize because you don’t know the sort vector in advance.

  • options.limit specifies the maximum number of documents to return. This example limits the entire list of matching documents to 100 documents or less.

    Vector search returns a single page of up to 1000 documents, unless you set a lower limit. Other searches (without $vector or $vectorize) return matching documents in batches of 20. Pagination occurs if there are more than 20 matching documents. For information about handling pagination, see Find documents.

The projection and options settings can make the response more focused and potentially reduce the amount of data transferred.

Response
{
  "data": {
    "documents": [
      {
        "$similarity": 1,
        "$vector": [
          0.15,
          0.1,
          0.1,
          0.35,
          0.55
        ],
        "_id": "3"
      },
      {
        "$similarity": 0.9953563,
        "$vector": [
          0.15,
          0.17,
          0.15,
          0.43,
          0.55
        ],
        "_id": "18"
      },
      {
        "$similarity": 0.9732053,
        "$vector": [
          0.21,
          0.22,
          0.33,
          0.44,
          0.53
        ],
        "_id": "21"
      }
    ],
    "nextPageState": null
  }
}

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com