Configure SAI indexes
Configuring the CQL environment for Storage-Attached Indexing (SAI) requires some important customization of cassandra.yaml
files.
Changes can also be implemented using most options on the CREATE TABLE command.
However, there are a few important settings to know.
Limits on the number of SAI indexes
Various limits apply to the number of SAI indexes, depending on the product.
The parameters can be set in the cassandra.yaml
file to adjust the limits.
SAI has limits that may or may not affect a cluster.
-
sai_indexes_total_failure_threshold: removed, index_count = 100 used instead
-
sai_indexes_per_table_failure_threshold: warn_threshold = -1, fail_threshold = -1
Increase file cache above the default value
The file cache is also known as the chunk cache. The chunk cache will store recently accessed sections of the SSTable in-memory as uncompressed buffers.
The default setting may not be sufficient for read workloads with heavily used SAI indexes.
If the file cache is too small, the memory used for the file cache may be insufficient to hold the index data.
In this case, the memory size should be increased using --XX:MaxDirectMemorySize
, and also increasing the file cache size to 75% of the memory size in the cassandra.yaml
file.
Configuring memtable flush writers
SAI indexes the in-memory memtables and the on-disk SSTables as they are written, and resolves the differences between those indexes at read time.
Generally, the default value is appropriate and does not need adjusting.
If the memtable_flush_writers
value is set too low, write operations may stall.
If this occurs, try increasing memtable_flush_writers
by a value of 2
in your development test environment.
Run tests under average and peak loads.
Observe the results and adjust memtable_flush_writers
further if necessary, until the setting prevents stalled writes.
When you determine a suitable memtable_flush_writers
value, consider setting it in production.
Setting timeout values for range reads and writes
The relevant properties in cassandra.yaml
are:
The timeout defaults may be appropriate for your apps.
However, on saturated nodes with heavy mixed reads/writes, these defaults could cause issues especially if database writes are unable to complete.
For example, if range reads consistently take longer than writes, you may observe WriteTimeoutExceptions
because the longer-running reads are dominating the writes.
If WriteTimeoutExceptions
occur, consider changing the default settings in development:
Because the "appropriate" timeouts are application dependent, it’s not possible to suggest precise values for all. As a starting point, though, first try decreasing the range request timeout by half from its default. Then under peak load, test whether overall throughput improved. As needed, gradually adjust the timeouts to suit your app’s requirements.
Completing write operations is obviously critical. The balance with read operations depends on your response-time SLA with users.
Compaction strategies
Any compaction strategy can be used with SAI indexes. LCS may require some tuning.
Read queries perform better with compaction strategies that produce fewer SSTables.
Make the following changes to the cassandra.yaml
file:
-
The
160
MB default for theCREATE TABLE
command’s option, described in this topic, may result in suboptimal performance for index queries that do not restrict on token range or partition key. -
While even higher values may be appropriate, depending on your hardware, the recommendation is to at least doubling the default value of
After increasing the MB value, observe whether the query performance improves on tables with SAI indexes.
To observe any performance deltas, per query, look at the QueryLatency
and SSTableIndexesHit
data in the query metrics.
Using a larger value reserves more disk space, because the SSTables are larger, and the ones destined for replacement will use more space while being compacted. However, the larger value results in having fewer SSTables, which lowers query latencies. Each SAI index should ultimately consume less space on disk because of better long-term compression with the larger indexes.
If query performance degrades on large (sstable_max_size
approximately 2GB) SAI indexed SSTables when the workload is not dominated by reads, but is experiencing increased write amplification, consider using