Configuring DSE Tiered Storage
Documentation on configuring DSE Tiered Storage to automate the smart movement of data between storage media.
- Configure the storage strategies to define storage locations, and the tiers
that define the storage locations, at the node level in the
dse.yaml file.
Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations are supported.
DataStax recommends local configuration testing before deploying cluster wide.
- Configure the age policy at the schema level.
The only supported data usage policy is partition age. Tier age thresholds are set when a table is created with the compaction strategy TieredCompactionStrategy.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
Procedure
-
In the dse.yaml file on each node, uncomment
the
tiered_storage_options
section. -
For each tiered storage strategy, define the configuration name, the storage
tiers, and the data directory locations for each tier.
- Define storage tiers in priority order with the fastest storage media in the tier that is listed first.
- For each tier, define the data directory locations.
Use this format, where config_name is the tiered storage strategy that you reference with the or statements. The config_name must be the same across all nodes:tiered_storage_options: config_name: tiers: - paths: - path_to_directory1 - paths: - path_to_directory2
where:- config_name is the configurable name of the tiered storage configuration strategy. For example: strategy1.
- tiers is the section define a storage tier with the paths and file paths that define the priority order.
- paths is the section of file paths that define the data directories for this tier of the disk configuration. Typically list the fastest storage media first. These paths are used only to store data that is configured to use tiered storage. These paths are independent of any settings in the cassandra.yaml file.
For example, the tiered storage configuration namedstrategy1
has three different storage tiers ordered in priority (the first tier listed has highest priority):tiered_storage_options: strategy1: tiers: - paths: - /mnt1 - /mnt2 - paths: - /mnt3 - /mnt4 - paths: - /mnt5 - /mnt6
-
To apply the tiered storage strategies to selected tables, use CREATE or ALTER
table statements.
For example, to apply tiered storage to table ks.tbl:
CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c)) WITH COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy', 'tiering_strategy': 'TimeWindowStorageStrategy', 'config': 'strategy1', 'max_tier_ages': '3600,7200'};
Set timing metrics with the compaction options:- class
'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy' configures a table to use tiered storage.
- tiering_strategy
'tiering_strategy': 'TimeWindowStorageStrategy' uses TimeWindowStorageStrategy (TWSS) to determine which tier to move the data to. TWSS is a DSE Tiered Storage strategy that uses (TWCS).
- config
'config': 'strategy1' specifies to use the strategy that is configured in the dse.yaml file, in this case strategy1.
- max_tier_ages'max_tier_ages': '3600,7200' uses the values in a comma-separated list to define the maximum age per tier, in seconds, where:
- 3600 restricts the first tier to data that is aged an hour (3600 seconds) or less.
- 7200 restricts the second tier to data that aged two hours (7200 seconds) or less.
- All other data is routed to the data direction locations that are defined for the third tier.
Note: For TimeWindowStorageStrategy (TWSS), DataStax recommends that one tier be defined for each time age that is specified for max_tier_ages, plus another tier for older data. However, DataStax Enterprise uses only the tiers that are configured in the table schema and the dse.yaml file.An implicit tier exists that represents the oldest data. For example, for a strategy with two tiers in dse.yaml:- 'max_tier_ages': '3600,7200' uses three tiers. Tier 0 would be for data newer than 3600 seconds, tier 1 would be for data between 3600 seconds and 7200 seconds, and tier 2 would be for data older than 7200 seconds.
- 'max_tier_ages': '3600' uses only the first two tiers.
- 'max_tier_ages': '3600,7200,10800' uses all three tiers, but ignores the last value. Any data that did not belong in the first two tiers goes to the third tier, whether the data was older than 10800 seconds or not.
- class