Sort clauses for collections
Sort and filter clauses can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries. |
Data API commands, such as find
, findOne
, deleteOne
, and updateOne
, can use sort
clauses to organize results based on similarity, or dissimilarity, to the given filter, such as a vector or field.
Additionally, you can use a projection to include specific document properties in the response.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
-
Python
-
TypeScript
-
Java
-
curl
|
When no particular order is required:
sort={} # (default when parameter not provided)
When sorting by a certain value in ascending/descending order:
from astrapy.constants import SortDocuments
# Ascending sort
sort={"field": SortDocuments.ASCENDING}
# Descending sort
sort={"field": SortDocuments.DESCENDING}
Be aware of the order when chaining multiple sorts. For example, when sorting first by a specific field and then by a specific subfield:
sort={
"field": SortDocuments.ASCENDING,
"subfield": SortDocuments.ASCENDING,
}
While modern Python versions preserve the order of dictionaries, it is suggested for clarity to employ a collections.OrderedDict
with chained sorts.
You can use sort
to perform a vector search:
# Use the specified vector,
# And then sort by similarity to the given vector.
sort={"$vector": [0.4, 0.15, -0.5]}
# Generate a vector from a string,
# Run a vector search,
# And then sort by similarity to the given vector.
# Requires a valid vectorize integration.
sort={"$vectorize": "Text to vectorize"}
Sort example
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
filter = {"seq": {"$exists": True}}
for doc in collection.find(filter, projection={"seq": True}, limit=5):
print(doc["seq"])
...
# will print e.g.:
# 37
# 35
# 10
# 36
# 27
cursor1 = collection.find(
{},
limit=4,
sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
[doc["_id"] for doc in cursor1]
# prints: ['97e85f81-...', '1581efe4-...', '...', '...']
cursor2 = collection.find({}, limit=3)
cursor2.distinct("seq")
# prints: [37, 35, 10]
collection.insert_many([
{"tag": "A", "$vector": [4, 5]},
{"tag": "B", "$vector": [3, 4]},
{"tag": "C", "$vector": [3, 2]},
{"tag": "D", "$vector": [4, 1]},
{"tag": "E", "$vector": [2, 5]},
])
ann_tags = [
document["tag"]
for document in collection.find(
{},
sort={"$vector": [3, 3]},
limit=3,
)
]
ann_tags
# prints: ['A', 'B', 'C']
# (assuming the collection has metric VectorMetric.COSINE)
|
|
When no particular order is required:
{ sort: {} } // (default when parameter not provided)
When sorting by a certain value in ascending/descending order:
{ sort: { field: +1 } } // ascending
{ sort: { field: -1 } } // descending
Be aware of the order when chaining multiple sorts because ES2015+ guarantees string keys in order of insertion For example, when sorting first by a field and then by a specific subfield:
{ sort: { field: 1, subfield: 1 } }
You can use sort
to perform a vector search:
// Use the specified vector,
// And then sort by similarity to the given vector.
{ sort: { $vector: [0.4, 0.15, -0.5] } }
// Generate a vector from a string,
// Run a vector search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration
{ sort: { $vectorize: "Text to vectorize" } }
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertMany([
{ name: 'Jane', age: 25, $vector: [1.0, 1.0, 1.0, 1.0, 1.0] },
{ name: 'Dave', age: 40, $vector: [0.4, 0.5, 0.6, 0.7, 0.8] },
{ name: 'Jack', age: 40, $vector: [0.1, 0.9, 0.0, 0.5, 0.7] },
]);
// Sort by age ascending, then by name descending (Jane, Jack, Dave)
const sorted1 = await collection.find({}, { sort: { age: 1, name: -1 } }).toArray();
console.log(sorted1.map(d => d.name));
// Sort by vector distance (Jane, Dave, Jack)
const sorted2 = await collection.find({}, { sort: { $vector: [1, 1, 1, 1, 1] } }).toArray();
console.log(sorted2.map(d => d.name));
})();
|
The sort()
operations are optional.
Use them only when needed.
Be aware of the order when chaining multiple sorts:
Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
FindOptions.Builder.sort(s1, s2);
You can use sort
to perform a vector search:
// Use the specified vector,
// And then sort by similarity to the given vector.
FindOptions.Builder
.sort(new float[] {0.4f, 0.15f, -0.5f});
// Generate a vector from a string,
// Run a vector search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration
FindOptions.Builder
.sort("Text to vectorize");
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sort;
import com.datastax.astra.client.model.Sorts;
import static com.datastax.astra.client.model.Filters.lt;
public class WorkingWithSorts {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Sort Clause for a vector
Sorts.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;
// Sort Clause for other fields
Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
// Build the sort clause
new FindOptions().sort(s1, s2);
// Adding vector
new FindOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}, s1, s2);
}
}
|
When you run a Find command, you can append nested JSON objects that define the search criteria (sort
or filter
), projection
, and other options
.
If no particular order is required, you can search with an empty filter
:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {},
}
}' | jq
This example finds documents by performing a vector search:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
"projection": { "$vector": 1 },
"options": {
"includeSimilarity": true,
"includeSortVector": false,
"limit": 100
}
}
}' | jq
This request does the following:
-
sort
compares the given vector,[0.15, 0.1, 0.1, 0.35, 0.55]
, against the vectors for documents in the collection, and then returns results ranked by similarity. The$vector
key is a reserved property name for storing vector data. -
projection
requests that the response return the$vector
for each document. -
options.includeSimilarity
requests that the response include the$similarity
key with the numeric similarity score, which represents the closeness of thesort
vector and the document’s vector. -
options.includeSortVector
is set to false to exclude thesortVector
from the response. This is only relevant ifsort
includes either$vector
or$vectorize
and you want the response to include the sort vector. This is particularly useful with$vectorize
because you don’t know the sort vector in advance. -
options.limit
specifies the maximum number of documents to return. This example limits the entire list of matching documents to 100 documents or less.Vector search returns a single page of up to 1000 documents, unless you set a lower
limit
. Other searches (without$vector
or$vectorize
) return matching documents in batches of 20. Pagination occurs if there are more than 20 matching documents. For information about handling pagination, see Find documents.
The projection
and options
settings can make the response more focused and potentially reduce the amount of data transferred.
Response
{
"data": {
"documents": [
{
"$similarity": 1,
"$vector": [
0.15,
0.1,
0.1,
0.35,
0.55
],
"_id": "3"
},
{
"$similarity": 0.9953563,
"$vector": [
0.15,
0.17,
0.15,
0.43,
0.55
],
"_id": "18"
},
{
"$similarity": 0.9732053,
"$vector": [
0.21,
0.22,
0.33,
0.44,
0.53
],
"_id": "21"
}
],
"nextPageState": null
}
}