Configuring DSE Tiered Storage

Configuring the data movement between storage media takes place at the node level and the schema level:

  • Configure the storage strategies to define storage locations, and the tiers that define the storage locations, at the node level in the dse.yaml file.

    Use OpsCenter Lifecycle Manager to run a Configure job to push the configuration to applicable nodes.

    Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations are supported.

    DataStax recommends local configuration testing before deploying cluster wide.

  • Configure the age policy at the schema level.

    The only supported data usage policy is partition age. Tier age thresholds are set when a table is created with the compaction strategy TieredCompactionStrategy.

The data sets used by DSE Tiered Storage can be very large. Search limitations and known Apache Solr™ issues apply.

Procedure

Where is the dse.yaml file?

The location of the dse.yaml file depends on the type of installation:

Installation Type Location

Package installations + Installer-Services installations

/etc/dse/dse.yaml

Tarball installations + Installer-No Services installations

<installation_location>/resources/dse/conf/dse.yaml

  1. In the dse.yaml file on each node, uncomment the tiered_storage_options section.

  2. For each tiered storage strategy, define the configuration name, the storage tiers, and the data directory locations for each tier.

    1. Define storage tiers in priority order with the fastest storage media in the tier that is listed first.

    2. For each tier, define the data directory locations.

      Use this format, where config_name is the tiered storage strategy that you reference with the CREATE TABLE or ALTER TABLE table statements. The config_name must be the same across all nodes:

      tiered_storage_options:
        config_name:
          tiers:
            - paths:
                - path_to_directory1
            - paths:
                - path_to_directory2

      where:

      • config_name is the configurable name of the tiered storage configuration strategy. For example: strategy1.

      • tiers is the section define a storage tier with the paths and file paths that define the priority order.

      • paths is the section of file paths that define the data directories for this tier of the disk configuration.

      Typically list the fastest storage media first. These paths are used only to store data that is configured to use tiered storage. These paths are independent of any settings in the cassandra.yaml file. For example, the tiered storage configuration named strategy1 has three different storage tiers ordered in priority (the first tier listed has highest priority):

      tiered_storage_options:
          strategy1:
            tiers:
              - paths:
                - /mnt1
                - /mnt2
              - paths:
                - /mnt3
                - /mnt4
              - paths:
                - /mnt5
                - /mnt6
  3. To apply the tiered storage strategies to selected tables, use CREATE or ALTER table statements.

    For example, to apply tiered storage to table ks.tbl:

    CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c))
    WITH COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy',
        'tiering_strategy': 'TimeWindowStorageStrategy',
        'config': 'strategy1',
        'max_tier_ages': '3600,7200'};

    Set timing metrics with the compaction options:

    • class

      'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy' configures a table to use tiered storage.

    • tiering_strategy

      'tiering_strategy': 'TimeWindowStorageStrategy' uses TimeWindowStorageStrategy (TWSS) to determine which tier to move the data to. TWSS is a DSE Tiered Storage strategy that uses TWCS.

    • config

      'config': 'strategy1' specifies to use the strategy that is configured in the dse.yaml file, in this case strategy1.

    • max_tier_ages

      'max_tier_ages': '3600,7200' uses the values in a comma-separated list to define the maximum age per tier, in seconds, where:

      • 3600 restricts the first tier to data that is aged an hour (3600 seconds) or less.

      • 7200 restricts the second tier to data that aged two hours (7200 seconds) or less.

      • All other data is routed to the data direction locations that are defined for the third tier.

      For TimeWindowStorageStrategy (TWSS), DataStax recommends that one tier be defined for each time age that is specified for max_tier_ages, plus another tier for older data. However, DataStax Enterprise uses only the tiers that are configured in the table schema and the dse.yaml file.

      An implicit tier exists that represents the oldest data. For example, for a strategy with two tiers in dse.yaml:

      • 'max_tier_ages': '3600,7200' uses three tiers. Tier 0 would be for data newer than 3600 seconds, tier 1 would be for data between 3600 seconds and 7200 seconds, and tier 2 would be for data older than 7200 seconds.

      • 'max_tier_ages': '3600' uses only the first two tiers.

      • 'max_tier_ages': '3600,7200,10800' uses all three tiers, but ignores the last value. Any data that did not belong in the first two tiers goes to the third tier, whether the data was older than 10800 seconds or not.

The CQL compaction subproperties for TWCS are also supported.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com