Configure SAI indexes
Configuring your Cassandra/DataStax Enterprise (DSE) environment for Storage-Attached Indexing (SAI) often does not require extensive customization of dse.yaml and cassandra.yaml files, or changes to how you use most options on the CREATE TABLE command. However, there are a few important settings to know.
Limits on the number of SAI indexes per table and total per cluster
By default in DataStax Astra DB and DataStax Enterprise (DSE) 6.8.3 and later:
-
The maximum number of SAI indexes per table is 10. The limit is set by
sai_indexes_per_table_failure_threshold
in cassandra.yaml. -
The maximum number of SAI indexes in the entire cluster is 50, as set by
sai_indexes_total_failure_threshold
in cassandra.yaml. See the guardrails section of the Astra DB reference topic.
Increase file cache above the default value
By default, the file cache’s file_cache_size_in_mb value is calculated as 50% of the MaxDirectMemorySize
setting.
File cache is also known as chunk cache. |
The file_cache_size_in_mb
value can be defined explicitly in cassandra.yaml.
This default for file_cache_size_in_mb
may result in suboptimal performance because DSE is not able to take full advantage of available memory.
DataStax recommends:
-
Increase
--XX:MaxDirectMemorySize
, leaving approximately 15-20% of memory for the OS and other in-memory structures. -
In cassandra.yaml, explicitly set
file_cache_size_in_mb
to 75% of that value.
In testing, this configuration resulted in improved indexing performance across read, write, and mixed read/write scenarios.
Compaction strategies
Read queries perform better with compaction strategies that produce fewer SSTables.
For most environments that include SAI indexes, DataStax recommends using the SizeTieredCompactionStrategy
(STCS), which is the default.
This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty, min_threshold
.
A minor compaction does not involve all the tables in a keyspace.
For details, see Configuring compaction.
For time series data, an alternative is the TimeWindowCompactionStrategy
(TWCS).
TWCS compacts SSTables using a series of time windows.
While in a time window, TWCS compacts all SSTables flushed from memory into larger SSTables using STCS.
At the end of the time window, all of these SSTables are compacted into a single SSTable.
Then the next time window starts and the process repeats.
The duration of the time window is the only setting required.
See TimeWindowCompactionStrategy.
For more information about TWCS, see How is data maintained?.
In general, do not use LeveledCompactionStrategy
(LCS) unless your index queries restrict the token range, either directly or by providing a restriction on the partition key.
However, if you decide to use LCS:
-
The
160
MB default for theCREATE TABLE
command’ssstable_size_in_mb
option, described in this topic, may result in suboptimal performance for index queries that do not restrict on token range or partition key. -
While even higher values may be appropriate, depending on your hardware, DataStax recommends at least doubling the default value of
sstable_size_in_mb
.
Example:
CREATE TABLE IF NOT EXISTS my_keyspace.my_table
.
.
.
WITH compaction = {
'class' : 'LeveledCompactionStrategy',
'sstable_size_in_mb' : '320' };
After increasing the MB value, observe whether the query performance improves on tables with SAI indexes.
To observe any performance deltas, per query, look at the QueryLatency
and SSTableIndexesHit
data in the DSE query metrics.
See Using DSE Metrics Collector.
Using a larger value reserves more disk space, because the SSTables are larger, and the ones destined for replacement will use more space while being compacted. However, the larger value results in having fewer SSTables, which lowers query latencies. Each SAI index should ultimately consume less space on disk because of better long-term compression with the larger indexes.
If query performance degrades on large (sstable_max_size
~2GB) SAI indexed SSTables when the workload is not dominated by reads but is experiencing increased write amplification, consider using Unified Compaction Strategy (UCS).
Enabling Asynchronous I/O (AIO)
In prior DSE 6.x releases, DataStax recommended disabling AIO and setting file_cache_size_in_mb
to 512 for search workloads, to improve indexing and query performance.
Starting with DSE 6.8.0, DataStax recommends enabling AIO and using the default file cache size, which is calculated as 50% of -XX:MaxDirectMemorySize
.
Test the performance in your development environment.
If the settings result in improved performance, consider making the changes in production.
The changed recommendation is based on DataStax performance testing results and is specific to DSE 6.8.0 and later releases. DSE enhancements were made so that the buffer pool no longer over-allocates memory.
By default, AIO is enabled.
However if you previously disabled AIO in your DSE 6.7.x or 6.0.x configuration, pass -Ddse.io.aio.enabled=true
to DSE at startup.
If you decide instead to disable AIO, DataStax recommends a cache size of at least 2GB, of which 512 MB should be reserved for in-flight reads. Example in cassandra.yaml:
file_cache_size_in_mb: 2048
inflight_data_overhead_in_mb: 512
With those properties, the size of the buffer pool will be 2048 MB, while the size of the cache will be 2048 - 512, or 1536 MB.
If AIO is disabled and the file cache size is small, for example less than 2GB, the default may not be sufficient for workloads that keep reads in flight for a prolonged time.
If you notice errors in the logs that indicate the buffer pool was exhausted, consider increasing the space for in-flight reads by setting the |
Setting timeout values for range reads and writes
Environments with heavy mixed read/write workloads are often sensitive to Threads Per Core (TPC) starvation, especially given the default timeouts for range reads and write operations.
The relevant properties in cassandra.yaml are:
-
range_request_timeout_in_ms
(default: 10 seconds) -
write_request_timeout_in_ms
(default: 2 seconds) -
read_request_timeout_in_ms
(default: 5 seconds)
For details, see Network timeout settings.
The timeout defaults may be appropriate for your apps.
However, on saturated nodes with heavy mixed reads/writes, these defaults could cause issues especially if database writes are unable to complete.
For example, if range reads consistently take longer than writes, you may observe WriteTimeoutExceptions
because the longer-running reads are dominating the writes.
If WriteTimeoutExceptions
occur, DataStax recommends that you consider changing the default settings in development:
-
Decrease
range_request_timeout_in_ms
-
Increase
write_request_timeout_in_ms
Because the "appropriate" timeouts are application dependent, it’s not possible to suggest precise values for all.
As a starting point, though, first try decreasing range_request_timeout_in_ms
by half from its default.
Then under peak load, test whether overall throughput improved.
As needed, gradually adjust the timeouts to suit your app’s requirements.
Completing write operations is obviously critical. The balance with read operations depends on your response-time SLA with users.
Configuring memtable flush writers
SAI indexes the in-memory memtables and the on-disk SSTables as they are written, and resolves the differences between those indexes at read time.
The default is 8
memtable flush writers.
Generally, the default value is appropriate and does not need adjusting.
If the memtable_flush_writers
value is set too low, write operations may stall.
If this occurs, try increasing memtable_flush_writers
to 10
in your development test environment.
Run tests under average and peak loads.
Observe the results and adjust memtable_flush_writers
further if necessary (such as increasing to 12
), until the setting prevents stalled writes.
When you determine a suitable memtable_flush_writers
value, consider setting it in production.
About SAI encryption
When Transparent Data Encryption (TDE) is enabled, encrypting SAI indexes does not require any special configuration or CQL commands. With SAI indexes, its on-disk components are simply additional SSTable data. To protect sensitive user data when TDE is enabled, including any present in the table’s partition key values, SAI encrypts all parts of the index that contain user data. That is, the trie index data for strings and the kd-tree data for numerics. By design, SAI does not encrypt non-user data such as postings metadata or SSTable-level offsets and tokens.
What’s next?
The next topic shows how you can monitor SAI indexes, including metrics and options for pre-configured dashboards.