Bulk Loading Data between TDE-enabled Clusters

A common operation in database environments is to bulk load data between clusters. For example, to facilitate testing of new functionality, you may need to load large amounts of data from a production environment to your development environment. When Transparent Data Encryption (TDE) is enabled, these secure environments require additional steps to ensure that the valid encryption keys are in place.

There are two types of keys used while streaming encrypted data:

  1. Decryptor

    Used to decrypt the SSTable during streaming. The decryptor must be the same key used to encrypt the data on the source cluster.

  2. Encryptor

    Used to encrypt the SSTable on the target cluster. The key is the one configured in the encryption option for the CQL table schema on the target cluster.

The decryptor and encryptor could be the same key or different keys. If you encounter errors during bulk data loading between clusters, the cause may be that your environment uses different keys, and the wrong key was used during decryption.

To bulk load data between two TDE-enabled clusters, follow these steps:

Procedure

  1. Copy the encryption key file used on the source cluster to the target cluster. The key resides in the directory identified by the system_key_directory option in dse.yaml. The default directory for the encryption key file is /etc/dse/conf. Do not change the name of encryption key when you copy the key from the source to the target cluster. For example, if the key file is named our_system_key on the source cluster, the same file name must be used on the target cluster, and placed in the target cluster’s designated system_key_directory.

    The default key file name, system_key, is often used on different clusters. If that is true for your environment, a problem would occur if you were to copy the key file from the source cluster to the target cluster. Two different keys with the same name cannot exist in the same directory. To avoid this scenario, rekey the target cluster to use a different key name. You can rename the existing key or generate a new key. Refer to Rekeying existing data.

  2. On the source cluster, get the key’s entries from the dse_system.encrypted_keys table.

    Example:

    SELECT * from dse_system.encrypted_keys;
    
    key_file       | cipher | strength | key_id                               | key
    ---------------+--------+----------+--------------------------------------+---------
    our_system_key | AES    | 128      | d9b3dd70-c764-11e7-abc4-793ec23f8a8c | kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU=
  3. On the target cluster, insert the same key entry.

    Example:

    INSERT INTO dse_system.encrypted_keys (key_file, cipher, strength, key_id, key) VALUES
    ('our_system_key', 'AES', 128, 'd9b3dd70-c764-11e7-abc4-793ec23f8a8c', 'kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU=');
  4. On the target cluster, verify that your added entry is in the dse_system.encrypted_keys table.

    Example:

    SELECT * from dse_system.encrypted_keys;
    
    key_file       | cipher | strength | key_id                               | key
    ---------------+--------+----------+--------------------------------------+---------
    our_system_key | AES    | 128      | d9b3dd70-c764-11e7-abc4-793ec23f8a8c | kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU=
    system_key_dev | AES    | 256      | 81847700-c99d-11e7-b9d9-23f36e5077c2 | 6YXE07AcEv61jvT6x7rdj6AHde0N6OHzxALNRnW1s7nVDFFQDArh64LousF8bXmy

    If you use the same key as decryptor and encryptor, the SELECT output shows only one key.

  5. If you change the encryption setting on the target cluster, then run the following command on all nodes in the target cluster to rewrite the SSTables using the new encryption key:

    nodetool upgradesstables --include-all-sstables

Results

After performing the prior steps, sstableloader should be able to run successfully during bulk data loading operations between two TDE-enabled clusters.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com