Configure DSE Tiered Storage

Configuring the data movement between storage media and DataStax Enterprise (DSE) takes place at the node level and the schema level:

  • Configure the storage strategies to define storage locations, and the tiers that define the storage locations, at the node level in the dse.yaml file.

    Use OpsCenter Lifecycle Manager to run a Configure job to push the configuration to applicable nodes.

    Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations are supported.

    DataStax recommends local configuration testing before deploying cluster wide.

  • Configure the age policy at the schema level.

    The only supported data usage policy is partition age. Tier age thresholds are set when a table is created with the compaction strategy TieredCompactionStrategy.

The data sets used by DSE Tiered Storage can be very large. Search limitations and known Apache Solr™ issues apply.

  1. In the dse.yaml file on each node, uncomment the tiered_storage_options section.

  2. For each tiered storage strategy, define the configuration name, the storage tiers, and the data directory locations for each tier:

    1. Define storage tiers in priority order with the fastest storage media in the tier that is listed first.

    2. For each tier, define the data directory locations:

      tiered_storage_options:
        CONFIG_NAME:
          tiers:
            - paths:
                - PATH_TO_DIRECTORY
            - paths:
                - PATH_TO_DIRECTORY

      Replace CONFIG_NAME with the name of the tiered storage strategy, such as strategy1, that you will reference with the CREATE TABLE or ALTER TABLE statements. This name must be the same across all nodes.

      In the tiers section describe your storage tier hierarchy with paths (paths) and filepaths (PATH_TO_DIRECTORY) in order from highest to lowest priority, starting with the fastest storage media. paths can contain one or more data directory locations for that tier of the disk configuration.

      These paths are used only to store data that is configured to use tiered storage. They are independent of any settings in the cassandra.yaml file.

      In the following example, the tiered storage configuration is named strategy1, and it has three different storage tiers where /mnt1 and /mnt2 are the highest priority storage locations:

      tiered_storage_options:
          strategy1:
            tiers:
              - paths:
                - /mnt1
                - /mnt2
              - paths:
                - /mnt3
                - /mnt4
              - paths:
                - /mnt5
                - /mnt6
  3. To apply the tiered storage strategies to selected tables, use CREATE TABLE or ALTER TABLE statements.

    For example, to apply a tiered storage strategy named strategy1 to a table named ks.tbl:

    CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c))
    WITH COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy',
        'tiering_strategy': 'TimeWindowStorageStrategy',
        'config': 'strategy1',
        'max_tier_ages': '3600,7200'};

    The following WITH COMPACTION properties are used to configure DSE Tiered Storage:

    • class: To use tiered storage, this must be set to 'org.apache.cassandra.db.compaction.TieredCompactionStrategy'.

    • tiering_strategy: Set to 'TimeWindowStorageStrategy' to use the TimeWindowStorageStrategy (TWSS) to determine which tier to move the data to. TWSS is a DSE Tiered Storage strategy that uses TimeWindowCompactionStrategy (TWCS). The CQL compaction subproperties for TWCS are also supported.

    • config: Set to the name of your tiered storage strategy that is defined in the dse.yaml file.

    • max_tier_ages: Provide comma-separated values that define the maximum age in seconds for each tier in priority order. Each value defines the maximum age of data that is stored in the corresponding tier. For example, 'max_tier_ages': '3600,7200' restricts the first tier to data that is aged 3600 seconds or less, and the second tier to data that is aged 7200 seconds or less, and all other data is routed to the third tier.

      For the TimeWindowStorageStrategy (TWSS), DataStax recommends that you define one tier for each age in max_tier_ages plus one additional tier for older data.

      Because DSE uses only the tiers that are configured in the table schema and the dse.yaml file, DSE Tiered Storage implicitly uses the last tier for all older data if max_tier_ages sets an age for all tiers in dse.yaml.

      For example, assume there are three tiers defined in dse.yaml:

      • If you set 'max_tier_ages': '3600,7200', DSE uses all three tiers. The first tier is for data newer than 3600 seconds, the second tier is for data aged between 3600 seconds and 7200 seconds, and the third tier is for data older than 7200 seconds.

      • If you set 'max_tier_ages': '3600', DSE only uses two tiers. All data older than 3600 seconds goes to the second tier, regardless of the existence of the third tier.

      • If you set 'max_tier_ages': '3600,7200,10800', DSE uses all three tiers, but it ignores the third age because there are only three tiers defined in dse.yaml. Any data that is older than 7200 seconds goes to the third tier, including data older than 10800 seconds.

        In order for this configuration to respect the 10800 age limit, you must define a fourth tier in dse.yaml.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM