Configure DSE Tiered Storage
Configuring the data movement between storage media and DataStax Enterprise (DSE) takes place at the node level and the schema level:
-
Configure the storage strategies to define storage locations, and the tiers that define the storage locations, at the node level in the
dse.yamlfile.Use OpsCenter Lifecycle Manager to run a Configure job to push the configuration to applicable nodes.
Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations are supported.
DataStax recommends local configuration testing before deploying cluster wide.
-
Configure the age policy at the schema level.
The only supported data usage policy is partition age. Tier age thresholds are set when a table is created with the compaction strategy
TieredCompactionStrategy.
The data sets used by DSE Tiered Storage can be very large. Search limitations and known Apache Solr™ issues apply.
-
In the
dse.yamlfile on each node, uncomment thetiered_storage_optionssection. -
For each tiered storage strategy, define the configuration name, the storage tiers, and the data directory locations for each tier:
-
Define storage tiers in priority order with the fastest storage media in the tier that is listed first.
-
For each tier, define the data directory locations:
tiered_storage_options: CONFIG_NAME: tiers: - paths: - PATH_TO_DIRECTORY - paths: - PATH_TO_DIRECTORYReplace
CONFIG_NAMEwith the name of the tiered storage strategy, such asstrategy1, that you will reference with theCREATE TABLEorALTER TABLEstatements. This name must be the same across all nodes.In the
tierssection describe your storage tier hierarchy with paths (paths) and filepaths (PATH_TO_DIRECTORY) in order from highest to lowest priority, starting with the fastest storage media.pathscan contain one or more data directory locations for that tier of the disk configuration.These paths are used only to store data that is configured to use tiered storage. They are independent of any settings in the
cassandra.yamlfile.In the following example, the tiered storage configuration is named
strategy1, and it has three different storage tiers where/mnt1and/mnt2are the highest priority storage locations:tiered_storage_options: strategy1: tiers: - paths: - /mnt1 - /mnt2 - paths: - /mnt3 - /mnt4 - paths: - /mnt5 - /mnt6
-
-
To apply the tiered storage strategies to selected tables, use
CREATE TABLEorALTER TABLEstatements.For example, to apply a tiered storage strategy named
strategy1to a table namedks.tbl:CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c)) WITH COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy', 'tiering_strategy': 'TimeWindowStorageStrategy', 'config': 'strategy1', 'max_tier_ages': '3600,7200'};The following
WITH COMPACTIONproperties are used to configure DSE Tiered Storage:-
class: To use tiered storage, this must be set to'org.apache.cassandra.db.compaction.TieredCompactionStrategy'. -
tiering_strategy: Set to'TimeWindowStorageStrategy'to use theTimeWindowStorageStrategy(TWSS) to determine which tier to move the data to. TWSS is a DSE Tiered Storage strategy that usesTimeWindowCompactionStrategy(TWCS). The CQL compaction subproperties for TWCS are also supported. -
config: Set to the name of your tiered storage strategy that is defined in thedse.yamlfile. -
max_tier_ages: Provide comma-separated values that define the maximum age in seconds for each tier in priority order. Each value defines the maximum age of data that is stored in the corresponding tier. For example,'max_tier_ages': '3600,7200'restricts the first tier to data that is aged 3600 seconds or less, and the second tier to data that is aged 7200 seconds or less, and all other data is routed to the third tier.For the
TimeWindowStorageStrategy(TWSS), DataStax recommends that you define one tier for each age inmax_tier_agesplus one additional tier for older data.Because DSE uses only the tiers that are configured in the table schema and the
dse.yamlfile, DSE Tiered Storage implicitly uses the last tier for all older data ifmax_tier_agessets an age for all tiers indse.yaml.For example, assume there are three tiers defined in
dse.yaml:-
If you set
'max_tier_ages': '3600,7200', DSE uses all three tiers. The first tier is for data newer than 3600 seconds, the second tier is for data aged between 3600 seconds and 7200 seconds, and the third tier is for data older than 7200 seconds. -
If you set
'max_tier_ages': '3600', DSE only uses two tiers. All data older than 3600 seconds goes to the second tier, regardless of the existence of the third tier. -
If you set
'max_tier_ages': '3600,7200,10800', DSE uses all three tiers, but it ignores the third age because there are only three tiers defined indse.yaml. Any data that is older than 7200 seconds goes to the third tier, including data older than 10800 seconds.In order for this configuration to respect the 10800 age limit, you must define a fourth tier in
dse.yaml.
-
-