Indexing with SSTable attached secondary indexes (SASI)

Explains what an SSTable Attached Secondary Index (SASI) is.

Attention: SASI indexes in DSE are experimental. DataStax does not support SASI indexes for production.

SASI is significantly less resource-intensive, using less memory, disk, and CPU. It enables querying with prefix and contains on strings, similar to the SQL implementation of LIKE = "foo%" or LIKE = "%foo%", as shown in SELECT. It also supports SPARSE indexing to improve performance of querying large, dense number ranges such as time series data.

SASI takes advantage of the databases's write-once immutable ordered data model to build indexes when data is flushed from the memtable to disk. The SASI index data structures are built in-memory as the SSTable is written and flushed to disk as sequential writes before the SSTable writing completes. One index file is written for each indexed column.

SASI supports all queries already supported by CQL, and supports the LIKE operator using PREFIX, CONTAINS, and SPARSE. If ALLOW FILTERING is used, SASI also supports queries with multiple predicates using AND. With SASI, the performance pitfalls of using filtering are not realized because the filtering is not performed even if ALLOW FILTERING is used.

SASI is implemented using memory-mapped B+ trees, an efficient data structure for indexes. B+ trees allow range queries to perform quickly. SASI generates an index for each SSTable. Some key features that arise from this design are:

SASI can reference offsets in the data file, bypassing the Bloom filter and partition indexes to go directly to where data is stored.
When SSTables are compacted, new indexes are generated automatically.

SASI does not support collections. Regular secondary indexes can be built for collections.