Configuring DSE Tiered Storage
Configuring the data movement between storage media takes place at the node level and the schema level:
-
Configure the storage strategies to define storage locations, and the tiers that define the storage locations, at the node level in the
dse.yaml
file.Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations are supported.
DataStax recommends local configuration testing before deploying cluster wide.
-
Configure the age policy at the schema level.
The only supported data usage policy is partition age. Tier age thresholds are set when a table is created with the compaction strategy
TieredCompactionStrategy
.
The data sets used by DSE Tiered Storage can be very large. Search limitations and known Apache Solr™ issues apply.
Procedure
-
In the
dse.yaml
file on each node, uncomment thetiered_storage_options
section. -
For each tiered storage strategy, define the configuration name, the storage tiers, and the data directory locations for each tier.
-
Define storage tiers in priority order with the fastest storage media in the tier that is listed first.
-
For each tier, define the data directory locations.
Use this format, where
<config_name>
is the tiered storage strategy that you reference with the CREATE TABLE or ALTER TABLE statements. The<config_name>
must be the same across all nodes:tiered_storage_options: <config_name>: tiers: - paths: - <path_to_directory1> - paths: - <path_to_directory2>
where:
-
<config_name>
is the configurable name of the tiered storage configuration strategy. For example:strategy1
. -
tiers
is the section configure a storage tier with the paths and filepaths that define the priority order. -
paths
is the section of filepaths that define the data directories for this tier of the disk configuration.
List the fastest storage media first. These paths are used to store only data that is configured to use tiered storage and are independent of any settings in the
cassandra.yaml
file. For example, the tiered storage configuration namedstrategy1
has three different storage tiers ordered in priority (the first tier listed has highest priority):tiered_storage_options: strategy1: tiers: - paths: - /mnt1 - /mnt2 - paths: - /mnt3 - /mnt4 - paths: - /mnt5 - /mnt6
-
-
-
To apply the tiered storage strategies to selected tables, use CREATE or ALTER table statements.
For example, to apply tiered storage to table
ks.tbl
:CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c)) WITH COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy', 'tiering_strategy': 'TimeWindowStorageStrategy', 'config': 'strategy1', 'max_tier_ages': '3600,7200'};
Set timing metrics with the compaction options:
-
class
'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy'
configures a table to use tiered storage. -
tiering_strategy
'tiering_strategy': 'TimeWindowStorageStrategy'
usesTimeWindowStorageStrategy
(TWSS) to determine which tier to move the data to. TWSS is a DSE Tiered Storage strategy that uses TimeWindowCompactionStrategy (TWCS). -
config
'config': 'strategy1'
specifies to use the strategy that is configured in thedse.yaml
file, in this casestrategy1
. -
max_tier_ages
'max_tier_ages': '3600,7200'
uses the values in a comma-separated list to define the maximum age per tier, in seconds, where:-
3600 restricts the first tier to data that is aged an hour (3600 seconds) or less.
-
7200 restricts the second tier to data that aged two hours (7200 seconds) or less.
-
All other data is routed to the data direction locations that are defined for the third tier.
For
TimeWindowStorageStrategy
(TWSS), DataStax recommends that one tier be defined for each time age that is specified formax_tier_ages
, plus another tier for older data. However, DataStax Enterprise uses only the tiers that are configured in the table schema and thedse.yaml
file.An implicit tier exists that represents the oldest data. For example, for a strategy with two tiers in
dse.yaml
:-
'max_tier_ages': '3600,7200'
uses three tiers. Tier 0 would be for data newer than 3600 seconds, tier 1 would be for data between 3600 seconds and 7200 seconds, and tier 2 would be for data older than 7200 seconds. -
'max_tier_ages': '3600'
uses only the first two tiers. -
'max_tier_ages': '3600,7200,10800'
uses all three tiers, but ignores the last value. Any data that did not belong in the first two tiers goes to the third tier, whether the data was older than 10800 seconds or not.
-
-
The CQL compaction subproperties for TWCS are also supported.