Configuring DSE Tiered Storage

Documentation on configuring DSE Tiered Storage to automate the smart movement of data between storage media.

Configuring the data movement between storage media takes place at the node level and the schema level:
  • Configure the storage strategies to define storage locations, and the tiers that define the storage locations, at the node level in the dse.yaml file.
    Tip: Use OpsCenter Lifecycle Manager to run a Configure job to push the configuration to applicable nodes.

    Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations are supported.

    DataStax recommends local configuration testing before deploying cluster wide.

  • Configure the age policy at the schema level.

    The only supported data usage policy is partition age. The only supported compaction strategy is DateTieredStorageStrategy. Tier age thresholds are set when a table is created with the compaction strategy TieredCompactionStrategy.

The data sets used by DSE Tiered Storage can be very large. Search limitations and known Apache Solr issues apply.
The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml

Procedure

  1. In the dse.yaml file on each node, uncomment the tiered_storage_options section.
  2. For each tiered storage strategy, define the configuration name, the storage tiers, and the data directory locations for each tier.
    1. Define storage tiers in priority order with the fastest storage media in the tier that is listed first.
    2. For each tier, define the data directory locations.
    Use this format, where config_name is the tiered storage strategy that you reference with the CREATE or ALTER table statements. The config_name must be the same across all nodes:
    tiered_storage_options:
      config_name:
        tiers:
          -paths:
              - path_to_directory1
          -paths:
              - path_to_directory2
    where:
    • config_name is the configurable name of the tiered storage configuration strategy. For example: strategy1.
    • tiers is the section that defines a storage tier with the paths and file paths that define the priority order.
    • paths is the section of file paths that define the data directories for this tier of the disk configuration. The tier that is listed first is the top tier that typically accesses the fastest storage media. These paths are used only to store data that is configured to use tiered storage. These paths are independent of any settings in the cassandra.yaml file.
    For example, the tiered storage configuration named strategy1 has three different storage tiers ordered in priority (the first tier listed has highest priority):
    tiered_storage_options:
        strategy1:
          tiers: 
            - paths:
              - /mnt1
              - /mnt2
            - paths:
              - /mnt3
              - /mnt4
            - paths:
              - /mnt5
              - /mnt6
  3. To apply the tiered storage strategies to selected tables, use CREATE or ALTER table statements.
    For example, to apply tiered storage to table ks.tbl:
    CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c)) 
    WITH COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy',  
        'tiering_strategy': 'DateTieredStorageStrategy', 
        'config': 'strategy1',  
        'max_tier_ages': '3600,7200'};
    Set timing metrics with the compaction options:
    • class

      'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy' configures a table to use tiered storage.

    • tiering_strategy

      'strategy': 'DateTieredStorageStrategy' uses DateTieredStorageStrategy to determine which tier to move the data to. DateTieredStorageStrategy is the only supported strategy.

    • config

      'config': 'strategy1' specifies to use the strategy that is configured in the dse.yaml file, in this case strategy1.

    • max_tier_ages
      'max_tier_ages': '3600,7200' uses the values in a comma-separated list to define the maximum age per tier, in seconds, where:
      • 3600 restricts the first tier to data that is aged an hour (3600 seconds) or less.
      • 7200 restricts the second tier to data that aged two hours (7200 seconds) or less.
      • All other data is routed to the data direction locations that are defined for the third tier.
      Note: For DateTieredStorageStrategy, DataStax recommends that one tier be defined for each time age that is specified for max_tier_ages, plus another tier for older data. However, DataStax Enterprise uses only the tiers that are configured in the table schema and the dse.yaml file.
      An implicit tier exists that represents the oldest data. For example, for a strategy with two tiers in dse.yaml:
      • 'max_tier_ages': '3600,7200' uses three tiers. Tier 0 would be for data newer than 3600 seconds, tier 1 would be for data between 3600 seconds and 7200 seconds, and tier 2 would be for data older than 7200 seconds.
      • 'max_tier_ages': '3600' uses only the first two tiers.
      • 'max_tier_ages': '3600,7200,10800' uses all three tiers, but ignores the last value. Any data that did not belong in the first two tiers goes to the third tier, whether the data was older than 10800 seconds or not.
    The CQL compaction subproperties for DTCS are also supported.