What is SAI?

Use Storage-Attached Indexing (SAI) to create multiple secondary indexes on the same table.

Introduction

Storage-Attached Indexing (SAI) is a highly-scalable, globally-distributed index for Apache Cassandra® that is available for DataStax Astra and DataStax Enterprise (DSE) databases.

Use SAI to add column-level indexes to any column and almost any Cassandra data type including text, numeric, and collection types. This functionality enables you to filter queries using CQL equality, range (numeric only), and CONTAINs semantics. SAI provides more functionality compared to Cassandra secondary indexes, is faster at writes compared to any Cassandra or DSE Search index, and uses significantly less disk space.

Examples in this guide, including the SAI quick start and Examine SAI column index and query rules, show how to create and use SAI indexes with the CREATE CUSTOM INDEX command.

Advantages

Defining one or more SAI indexes based on any column in a database table subsequently gives you the ability to run performant queries that specify the indexed column.

SAI uses significantly less disk space than other existing Cassandra or DSE Search indexes.

Especially compared to relational databases and complex indexing schemes, SAI makes you more efficient by accelerating your path to developing apps. SAI helps you accomplish this goal by removing commonly encountered stumbling blocks that otherwise occur with Solr configuration details and esoteric tuning parameters.

SAI is deeply integrated with the storage engine of Cassandra. The SAI functionality indexes the in-memory memtables and the on-disk SSTables as they are written, and resolves the differences between those indexes at read time. Consequently, the design of SAI has very little operational complexity on top of the core database. From snapshot creation, to schema management, to data expiration, SAI integrates tightly with the capabilities and mechanisms that the core database already provides.

SAI is also fully compatible with a feature known as zero-copy streaming. This feature means that as you bootstrap or decommission nodes in the cluster, the indexes are fully streamed with the SSTables and do not have to be serialized or rebuilt on the receiving node's end.

At its core, SAI is a filtering engine, and simplifies data modeling and client applications that would otherwise rely heavily on maintaining multiple query-specific tables.

DataStax SAI performance testing results as of DSE 6.8.3:

Table 1. Comparing SAI performance with other indexing solutions
Compared to 2i indexes Compared to DSE Search indexes
General SAI provides more functionality compared to Secondary indexes (2is) at a fraction of the disk space SAI provides significant Total Cost of Ownership (TCO) advantages for indexing from disk space, infrastructure, and operations
Write Path

Throughput – SAI 43% better than 2is

Latency – SAI 230% better than 2is

Throughput – SAI 86% better than DSE Search

Latency – SAI 670% better than DSE Search

Read Path

Throughput – SAI slightly better than 2is

Latency – SAI similar to 2is

Throughput – SAI slightly better than DSE Search

Latency – SAI similar to DSE Search

Astra cloud users

SAI indexing features work identically whether you choose to self manage your database with DSE, or use the zero-Ops DataStax Astra cloud. For cloud users, if you haven't already, follow the preliminary steps in the Astra documentation to create and launch your Astra database. In those steps, you'll specify a keyspace. While still in the Astra UI, you can open the CQL Console and enter DDL and DML commands to manage the database.

Ready to get started?

See the SAI FAQs. Then follow the steps in the SAI quick start to learn how to add SAI indexes on database tables.