Bulk Loading Data between TDE-enabled Clusters
A common operation in database environments is to bulk load data between clusters. For example, to facilitate testing of new functionality, you may need to load large amounts of data from a production environment to your development environment. When Transparent Data Encryption (TDE) is enabled, these secure environments require additional steps to ensure that the valid encryption keys are in place.
There are two types of keys used while streaming encrypted data:
-
Decryptor
Used to decrypt the SSTable during streaming. The decryptor must be the same key used to encrypt the data on the source cluster.
-
Encryptor
Used to encrypt the SSTable on the target cluster. The key is the one configured in the encryption option for the
CQL
table schema on the target cluster.
The decryptor and encryptor could be the same key or different keys. If you encounter errors during bulk data loading between clusters, the cause may be that your environment uses different keys, and the wrong key was used during decryption.
To bulk load data between two TDE-enabled clusters, follow these steps:
Procedure
-
Copy the encryption key file used on the source cluster to the target cluster. The key resides in the directory identified by the
system_key_directory
option indse.yaml
. The default directory for the encryption key file is/etc/dse/conf
. Do not change the name of encryption key when you copy the key from the source to the target cluster. For example, if the key file is namedour_system_key
on the source cluster, the same file name must be used on the target cluster, and placed in the target cluster’s designatedsystem_key_directory
.The default key file name,
system_key
, is often used on different clusters. If that is true for your environment, a problem would occur if you were to copy the key file from the source cluster to the target cluster. Two different keys with the same name cannot exist in the same directory. To avoid this scenario, rekey the target cluster to use a different key name. You can rename the existing key or generate a new key. Refer to Rekeying existing data. -
On the source cluster, get the key’s entries from the
dse_system.encrypted_keys
table.Example:
SELECT * from dse_system.encrypted_keys; key_file | cipher | strength | key_id | key ---------------+--------+----------+--------------------------------------+--------- our_system_key | AES | 128 | d9b3dd70-c764-11e7-abc4-793ec23f8a8c | kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU=
-
On the target cluster, insert the same key entry.
Example:
INSERT INTO dse_system.encrypted_keys (key_file, cipher, strength, key_id, key) VALUES ('our_system_key', 'AES', 128, 'd9b3dd70-c764-11e7-abc4-793ec23f8a8c', 'kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU=');
-
On the target cluster, verify that your added entry is in the
dse_system.encrypted_keys
table.Example:
SELECT * from dse_system.encrypted_keys; key_file | cipher | strength | key_id | key ---------------+--------+----------+--------------------------------------+--------- our_system_key | AES | 128 | d9b3dd70-c764-11e7-abc4-793ec23f8a8c | kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU= system_key_dev | AES | 256 | 81847700-c99d-11e7-b9d9-23f36e5077c2 | 6YXE07AcEv61jvT6x7rdj6AHde0N6OHzxALNRnW1s7nVDFFQDArh64LousF8bXmy
If you use the same key as decryptor and encryptor, the
SELECT
output shows only one key. -
If you change the encryption setting on the target cluster, then run the following command on all nodes in the target cluster to rewrite the SSTables using the new encryption key:
nodetool upgradesstables --include-all-sstables
Results
After performing the prior steps, sstableloader
should be able to run successfully during bulk data loading operations between two TDE-enabled clusters.