Configure SAI indexes

Configuring the CQL environment for Storage-Attached Indexing (SAI) requires some important customization of cassandra.yaml files. Changes can also be implemented using most options on the CREATE TABLE command. However, there are a few important settings to know.

Limits on the number of SAI indexes

Various limits apply to the number of SAI indexes, depending on the product. The parameters can be set in the cassandra.yaml file to adjust the limits.

SAI has limits that might affect a cluster:

sai_indexes_total_failure_threshold: removed, index_count = 100 used instead
sai_indexes_per_table_failure_threshold: warn_threshold = -1, fail_threshold = -1
sai_indexes_total_failure_threshold: commented out, set to 100
sai_indexes_per_table_failure_threshold: commented out, set to 10

Increase file cache above the default value

The file cache is also known as the chunk cache. The chunk cache will store recently accessed sections of the SSTable in-memory as uncompressed buffers. When not set, the default is calculated as 1/4 of (system RAM - max heap). When not set, the default is the smaller of 1/4 of heap or 512MB.

The default setting may not be sufficient for read workloads with heavily used SAI indexes. If the file cache is too small, the memory used for the file cache may be insufficient to hold the index data. In this case, the memory size should be increased using --XX:MaxDirectMemorySize, and also increasing the file cache size to 75% of the memory size in the cassandra.yaml file.

Configuring memtable flush writers

SAI indexes the in-memory memtables and the on-disk SSTables as they are written, and resolves the differences between those indexes at read time.

The default is 8 memtable flush writers. The default is 2 memtable flush writers. Generally, the default value is appropriate and does not need adjusting.

If the memtable_flush_writers value is set too low, write operations may stall.

If this occurs, try increasing memtable_flush_writers by a value of 2 in your development test environment. Run tests under average and peak loads. Observe the results and adjust memtable_flush_writers further if necessary, until the setting prevents stalled writes. When you determine a suitable memtable_flush_writers value, consider setting it in production.

See memtable_flush_writers. See memtable_flush_writers.

Setting timeout values for range reads and writes

Environments with heavy mixed read/write workloads are often sensitive to Threads Per Core (TPC) starvation, especially given the default timeouts for range reads and write operations. The default timeouts for range reads and write operations can cause issues in environments with heavy mixed read/write workloads.

The relevant properties in cassandra.yaml are:

range_request_timeout_in_ms (default: 10 seconds)
write_request_timeout_in_ms (default: 2 seconds)
read_request_timeout_in_ms (default: 5 seconds)

For details, see Network timeout settings.

range_request_timeout (default: 10000 ms)
write_request_timeout (default: 2000 ms)
read_request_timeout (default: 5000 ms)

For details, see Network timeout settings.

The timeout defaults may be appropriate for your apps. However, on saturated nodes with heavy mixed reads/writes, these defaults could cause issues especially if database writes are unable to complete. For example, if range reads consistently take longer than writes, you may observe WriteTimeoutExceptions because the longer-running reads are dominating the writes. If WriteTimeoutExceptions occur, consider changing the default settings in development: * Decrease range_request_timeout_in_ms * Increase write_request_timeout_in_ms * Decrease range_request_timeout * Increase write_request_timeout

Because the "appropriate" timeouts are application dependent, it’s not possible to suggest precise values for all. As a starting point, though, first try decreasing the range request timeout by half from its default. Then under peak load, test whether overall throughput improved. As needed, gradually adjust the timeouts to suit your app’s requirements.

Completing write operations is obviously critical. The balance with read operations depends on your response-time SLA with users.

Compaction strategies

Any compaction strategy can be used with SAI indexes. LCS may require some tuning.

Read queries perform better with compaction strategies that produce fewer SSTables.

Make the following changes to the cassandra.yaml file:

The 160 MB default for the CREATE TABLE command’s sstable_size_in_mb sstable_size option, described in this topic, may result in suboptimal performance for index queries that do not restrict on token range or partition key.
While even higher values may be appropriate, depending on your hardware, the recommendation is to at least doubling the default value of sstable_size_in_mb.

Example:

CREATE TABLE IF NOT EXISTS my_keyspace.my_table
.
.
.
   WITH compaction = {
     'class' : 'LeveledCompactionStrategy',
     'sstable_size_in_mb' : '320' };

sstable_size.

Example:

CREATE TABLE IF NOT EXISTS my_keyspace.my_table
.
.
.
   WITH compaction = {
     'class' : 'LeveledCompactionStrategy',
     'sstable_size' : '320MB' };

After increasing the MB value, observe whether the query performance improves on tables with SAI indexes. To observe any performance deltas, per query, look at the QueryLatency and SSTableIndexesHit data in the query metrics. See Monitor cluster metrics with DSE Metrics Collector.

Using a larger value reserves more disk space, because the SSTables are larger, and the ones destined for replacement will use more space while being compacted. However, the larger value results in having fewer SSTables, which lowers query latencies. Each SAI index should ultimately consume less space on disk because of better long-term compression with the larger indexes.

If query performance degrades on large (sstable_max_size approximately 2GB) SAI indexed SSTables when the workload is not dominated by reads, but is experiencing increased write amplification, consider using Unified Compaction Strategy (UCS). Unified Compaction Strategy (UCS).

Enabling Asynchronous I/O (AIO)

In prior DSE 6.x releases, DataStax recommended disabling AIO and setting file_cache_size_in_mb to 512 for search workloads, to improve indexing and query performance. Starting with DSE 6.8.0, DataStax recommends enabling AIO and using the default file cache size, which is calculated as 50% of -XX:MaxDirectMemorySize. Test the performance in your development environment. If the settings result in improved performance, consider making the changes in production.

The changed recommendation is based on DataStax performance testing results and is specific to DSE 6.8.0 and later releases. DSE enhancements were made so that the buffer pool no longer over-allocates memory.

By default, AIO is enabled. However if you previously disabled AIO in your DSE 6.7.x or 6.0.x configuration, pass -Ddse.io.aio.enabled=true to DSE at startup.

If you decide instead to disable AIO, DataStax recommends a cache size of at least 2GB, of which 512 MB should be reserved for in-flight reads. Example in cassandra.yaml:

file_cache_size_in_mb: 2048
inflight_data_overhead_in_mb: 512

With those properties, the size of the buffer pool will be 2048 MB, while the size of the cache will be 2048 - 512, or 1536 MB.

If AIO is disabled and the file cache size is small, for example less than 2GB, the default may not be sufficient for workloads that keep reads in flight for a prolonged time. If you notice errors in the logs that indicate the buffer pool was exhausted, consider increasing the space for in-flight reads by setting the inflight_data_overhead_in_mb property. Or, as recommended above, enable AIO and use the calculated default file cache size.

About SAI encryption

When Transparent Data Encryption (TDE) is enabled, encrypting SAI indexes does not require any special configuration or CQL commands. With SAI indexes, its on-disk components are simply additional SSTable data. To protect sensitive user data when TDE is enabled, including any present in the table’s partition key values, SAI encrypts all parts of the index that contain user data. That is, the trie index data for strings and the kd-tree data for numerics. By design, SAI does not encrypt non-user data such as postings metadata or SSTable-level offsets and tokens.