Indexing
What is Storage-Attached Indexing and why do we use it?
Storage-Attached Indexing (SAI) provides unparalleled I/O throughput for serverless Astra databases with Vector Search. This is a highly-scalable and globally-distributed index that adds column-level indexes to any vector data type column.
We also use SAI because:
-
It can define multiple indexes on the same database table.
-
It can index both queries and content (large inputs include documents, words, and images) to capture semantics.
-
Each SAI indexes any column in the table, except for single partition key columns.
How does SAI work?
SAI uses Hierarchical Navigable Small World (HNSW), an algorithm for Approximate Nearest Neighbor (ANN) search, to create a hierarchy of graphs. Each level of the hierarchy corresponds to a small world graph that is navigable.
For any given node (data point) in the graph, it is easy to find a path to any other node. The higher levels of the hierarchy have fewer nodes and are used for coarse navigation, while the lower levels have more nodes and are used for fine navigation. Such indexing structures enable fast retrieval by narrowing down the search space to potential matches.
ANN finds the data points in a dataset that are closest (or most similar) to a given query point. Finding these neighbors can be computationally expensive, particularly when dealing with high-dimensional data. Therefore, ANN algorithms aim to find the nearest neighbors approximately, prioritizing speed and efficiency over exact accuracy.
For more, see SAI FAQ.