Configuring DSE Tiered Storage
Configuring the data movement between storage media takes place at the node level and the schema level:
-
Configure the storage strategies to define storage locations, and the tiers that define the storage locations, at the node level in the
dse.yaml
file.Use OpsCenter Lifecycle Manager to run a Configure job to push the configuration to applicable nodes.
Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations are supported.
DataStax recommends local configuration testing before deploying cluster wide.
-
Configure the age policy at the schema level.
The only supported data usage policy is partition age. Tier age thresholds are set when a table is created with the compaction strategy
TieredCompactionStrategy
.
The data sets used by DSE Tiered Storage can be very large. Search limitations and known Apache Solr™ issues apply.
Procedure
Where is the dse.yaml
file?
The location of the dse.yaml
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
-
In the
dse.yaml
file on each node, uncomment thetiered_storage_options
section. -
For each tiered storage strategy, define the configuration name, the storage tiers, and the data directory locations for each tier.
-
Define storage tiers in priority order with the fastest storage media in the tier that is listed first.
-
For each tier, define the data directory locations.
Use this format, where
config_name
is the tiered storage strategy that you reference with theCREATE TABLE
orALTER TABLE
table statements. Theconfig_name
must be the same across all nodes:tiered_storage_options: config_name: tiers: - paths: - path_to_directory1 - paths: - path_to_directory2
where:
-
config_name
is the configurable name of the tiered storage configuration strategy. For example:strategy1
. -
tiers is the section define a storage tier with the paths and file paths that define the priority order.
-
paths is the section of file paths that define the data directories for this tier of the disk configuration.
Typically list the fastest storage media first. These paths are used only to store data that is configured to use tiered storage. These paths are independent of any settings in the
cassandra.yaml
file. For example, the tiered storage configuration namedstrategy1
has three different storage tiers ordered in priority (the first tier listed has highest priority):tiered_storage_options: strategy1: tiers: - paths: - /mnt1 - /mnt2 - paths: - /mnt3 - /mnt4 - paths: - /mnt5 - /mnt6
-
-
-
To apply the tiered storage strategies to selected tables, use
CREATE
orALTER
table statements.For example, to apply tiered storage to table
ks.tbl
:CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c)) WITH COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy', 'tiering_strategy': 'TimeWindowStorageStrategy', 'config': 'strategy1', 'max_tier_ages': '3600,7200'};
Set timing metrics with the compaction options:
-
class
'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy'
configures a table to use tiered storage. -
tiering_strategy
'tiering_strategy': 'TimeWindowStorageStrategy'
usesTimeWindowStorageStrategy
(TWSS) to determine which tier to move the data to. TWSS is a DSE Tiered Storage strategy that uses TWCS. -
config
'config': 'strategy1' specifies to use the strategy that is configured in the
dse.yaml
file, in this casestrategy1
. -
max_tier_ages
'max_tier_ages': '3600,7200'
uses the values in a comma-separated list to define the maximum age per tier, in seconds, where:-
3600 restricts the first tier to data that is aged an hour (3600 seconds) or less.
-
7200 restricts the second tier to data that aged two hours (7200 seconds) or less.
-
All other data is routed to the data direction locations that are defined for the third tier.
For
TimeWindowStorageStrategy
(TWSS), DataStax recommends that one tier be defined for each time age that is specified formax_tier_ages
, plus another tier for older data. However, DataStax Enterprise uses only the tiers that are configured in the table schema and thedse.yaml
file.An implicit tier exists that represents the oldest data. For example, for a strategy with two tiers in
dse.yaml
:-
'max_tier_ages': '3600,7200'
uses three tiers. Tier 0 would be for data newer than 3600 seconds, tier 1 would be for data between 3600 seconds and 7200 seconds, and tier 2 would be for data older than 7200 seconds. -
'max_tier_ages': '3600'
uses only the first two tiers. -
'max_tier_ages': '3600,7200,10800'
uses all three tiers, but ignores the last value. Any data that did not belong in the first two tiers goes to the third tier, whether the data was older than 10800 seconds or not.
-
-
The CQL compaction subproperties for TWCS are also supported.