SAI concepts

Storage-Attached Indexing (SAI) is a highly-scalable, globally-distributed index for DataStax Astra DB (both serverless and classic), and DataStax Enterprise (DSE) databases.

The main advantage of SAI over existing indexes for Cassandra are:

  • enables vector search for AI applications

  • shares common index data across multiple indexes on same table

  • alleviates write-time scalability issues

  • significantly reduced disk usage

  • great numeric range performance

  • zero copy streaming of indexes

In fact, SAI provides the most indexing functionality available for Cassandra. SAI adds column-level indexes to any CQL table column of almost any CQL data type.

SAI enables queries that filter based on:

  • vector embeddings

  • AND logic

  • OR logic (Astra DB)

  • numeric range

  • non-variable length numeric types

  • text type equality

  • CONTAINs logic (for collections)

  • tokenized data

  • row-aware query path

  • case sensitivity (optional)

  • unicode normalization (optional)

Advantages

Defining one or more SAI indexes based on any column in a database table subsequently gives you the ability to run performant queries that specify the indexed column. Especially compared to relational databases and complex indexing schemes, SAI makes you more efficient by accelerating your path to developing apps.

SAI is deeply integrated with the storage engine of Cassandra. The SAI functionality indexes the in-memory memtables and the on-disk SSTables as they are written, and resolves the differences between those indexes at read time. Consequently, the design of SAI has very little operational complexity on top of the core database. From snapshot creation, to schema management, to data expiration, SAI integrates tightly with the capabilities and mechanisms that the core database already provides.

SAI is also fully compatible with zero-copy streaming (ZCS). Thus, when you bootstrap or decommission nodes in a cluster, the indexes are fully streamed with the SSTables and not serialized or rebuilt on the receiving node’s end.

At its core, SAI is a filtering engine, and simplifies data modeling and client applications that would otherwise rely heavily on maintaining multiple query-specific tables.

Performance

SAI outperforms any other indexing method available for Cassandra.

SAI provides more functionality than secondary indexing (2i), using a fraction of the disk space, and reducing the total cost of ownership (TCO) for disk, infrastructure, and operations. SAI is also more functional than DataStax DataStax Enterprise (DSE) Search indexing for the same reasons. For write path performance, SAI outperforms both the other indexing methods for throughput (43 for 2i,86% for DSE Search) and significantly outperforms both in latency (230 for 2i, 670% for DSE Search). For read path performance, SAI performs at least as well as both the other indexing methods for throughput and latency.


Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com