Using DSE Advanced Replication
Operations including starting, stopping, and configuring DSE Advanced Replication.
Operations including starting, stopping, and configuring DSE Advanced Replication.
- Starting DSE Advanced Replication
- Stopping DSE Advanced Replication
- Configuring global settings
- Configuring channel settings
- Encrypting driver passwords
- Data insert methods
Starting DSE Advanced Replication
Before you can start and use DSE Advanced Replication, you must create the user keyspaces and tables on the edge cluster and the hub cluster.
- Enable replication in the dse.yaml
file.At the end of the file, uncomment all
advanced_replication_options
entries and setenabled: true
.# Advanced Replication configuration settings advanced_replication_options: enabled: true
- Do a rolling restart: restart the nodes in the edge cluster one at a time while the other nodes continue to operate online.
Disabling DSE Advanced Replication
- To disable replication, edit the dse.yaml file.In the
advanced_replication_options
section, setenabled: false
.# Advanced Replication configuration settings advanced_replication_options: enabled: false
- Do a rolling restart: restart the nodes in the edge cluster one at a time while the other nodes continue to operate online.
- To clean out the data that was used for DSE Advanced Replication, remove these
keyspaces:
cqlsh> drop table dse_system.advrep_conf; cqlsh> drop table dse_system.advrep_table_conf; cqlsh> drop keyspace advrep;
Configuring global settings
Global settings apply to the entire edge cluster. These global settings are stored in the CQL table dse_system.advrep_conf that is automatically created.
cqlsh>INSERT INTO dse_system.advrep_conf (conf_key, conf_val) VALUES ('edge_replication', 'true');
dse advrep edge conf ...You can provide authentication credentials in several ways, see Credentials for authentication.
The following table describes the configuration keys, their default values, and identifies when a restart of the edge node is required for the change to be recognized. Use configuration keys to change global settings. Global settings apply to the entire edge cluster. These global settings are stored in the CQL table dse_system.advrep_conf that is automatically created.
Configuration key | Default value | Description | Restart required |
---|---|---|---|
hub-ip-addresses | none | REQUIRED. A comma separated list of IP addresses that are used to connect to the hub cluster using the Cassandra driver. | No |
hub-port | none | Specifies a non-default port to connect nodes on the hub cluster. To
specify non-default
port:
dse advrep edge conf --hub-port 9999 Set replication config hub_port from None to 9999To remove the non-default port: dse advrep edge remove-conf --hub-port |
Yes |
debug-output | False | Prints internal debug output about the replication log. | No |
cql-refresh-row-limit | |||
driver-allow-remote-dcs-for-local-cl | False | Set to true to enable automatic failover for hub clusters with multiple datacenters. The value of the driver-consistency-level parameter must be LOCAL_ONE or LOCAL_QUORUM. | Yes |
driver-compression | lz4 | The compression algorithm the driver uses to send data from the edge to the hub. Supported values are lz4 and snappy. | Yes |
driver-connect-timeout | 15000 | Time in milliseconds the driver waits to connect to a server. | No |
driver-connections | 32 | The number of connections the Cassandra driver will create. | Yes |
driver-connections-max | 256 | The maximum number of connections the Cassandra driver will create. | Yes |
driver-consistency-level | ONE | The consistency level used by the driver when executing statements for replicating data to the hub. Specify a valid Cassandra consistency level: ANY, ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, or LOCAL_ONE. | No |
driver-local-dc | N/A | For hub clusters with multiple datacenters, you can explicitly define the name of the datacenter that you consider local. Typically, this is the datacenter that is closest to the edge cluster. This value is used only for clusters with multiple data enters. | Yes |
driver-pwd | none | Driver password if the hub requires a user and password to connect. Note: By default, driver user names and passwords are plain text.
DataStax recommends encrypting the driver passwords before you add them to the
CQL table.
|
Yes |
driver-read-timeout | 15000 | Time in milliseconds the driver waits to read responses from a server. | No |
driver-ssl-enabled | false | Whether SSL is enabled for connection to the hub. | Yes |
driver_ssl_keystore_path | none | The path to the keystore for connection to Cassandra when SSL client authentication is enabled. | Yes |
driver_ssl_keystore_password | none | The keystore password for connection to Cassandra when SSL client authentication is enabled. | Yes |
driver_ssl_keystore_type | none | The keystore type for connection to Cassandra when SSL client authentication is enabled. | Yes |
driver_ssl_truststore_path | none | The path to the truststore for connection to Cassandra when SSL is enabled. | Yes |
driver-ssl-truststore-password | none | The truststore password for connection to Cassandra when SSL is enabled. | Yes |
driver-ssl-truststore-type | none | The keystore type for connection to Cassandra when SSL client authentication is enabled. | Yes |
driver-ssl-protocol | TLS | The SSL protocol for connection to Cassandra when SSL is enabled. | Yes |
driver-ssl-cipher-suites | none | A comma-separated list of SSL cipher suites for connection to Cassandra when SSL is enabled. Cipher suites must be supported by the edge machine. | Yes |
driver-used-hosts-per-remote-dc | 0 | To use automatic failover for hub clusters with multiple datacenters, you must define the number of hosts per remote datacenter that the datacenter aware round robin policy (DCAwareRoundRobinPolicy) considers available for use. | Yes |
driver-user | none | Driver username if the hub requires a user and password to connect. | Yes |
edge-id | N/A | Identifies this edge cluster and all inserts from this cluster. The edge-id must also exist in the primary key on the hub for population of the edge-id to occur. | No |
edge-id-col-name | edge-id | The column to use on remote tables to insert the edge id as part of the update. If this column is not present on the table that is being updated, the edge id value is ignored. | No |
edge-replication | False | If true, replication starts. If false and replication is running, replication is stopped. The replication log remains intact. | No |
invalid-message-log | SYSTEM_LOG | Select one of these logging strategies to adopt when an invalid message is
discarded: SYSTEM_LOG: Log the CQL query and the error message in the system log on the hub. TABLE_LOG: Store the CQL query and the error message in the dse_advrep.advrep_invalid_messages_log Cassandra table on the hub. NONE: Perform no logging. See Managing invalid messages. |
No |
invalid-message-max-retries | 15 | The maximum number of times to retry an invalid message before discarding it and removing it from the replication log. | No |
replication-log-audit-log-enabled | false | Specifies whether to store the audit log. | Yes |
replication-log-audit-log-file | /tmp/advrep_rl_audit.log | Specifies the file name prefix template for the audit log file. The file name is appended with .gz if compressed using gzip. | Yes |
replication-log-audit-log-file-gzipped | true | Specifies to compress the audit log output file using gzip compression. | Yes |
replication-log-audit-log-file-rotate-time-max-lifespan-mins | 0 |
Specifies the maximum lifetime of audit log files.
Periodically, when log files are rotated, audit log files are purged when
they:
|
Yes |
replication-log-audit-log-file-rotate-time-mins | 60 | Specifies the time interval to rotate the audit log file. On rotation, the rotated file is appended with the log counter .[logcounter], incrementing from [0]. To disable rotation, set to 0. | Yes |
replication-log-buckets | 512:128:DAYS:1 | Defines the quantity and availability of buckets that
messages are written to.
num_random_buckets:num_time_buckets:time_unit:num_time_unit The num_random_buckets and the num_time_buckets values are rounded to the nearest power of 2. Messages are written into a replication bucket. The row of buckets is determined by the time buckets where each bucket is time_unit * num_time_unit. There are num_time_buckets buckets at a time. Messages are written sequentially based on the time of the message. One time bucket can have messages in a specific time period, and also have messages for an earlier time period num_time_buckets * time_unit * num_time_unit. In the time-based row, the actual time bucket is randomly selected to minimize clustering of buckets, to increase performance, and to maintain a balanced distribution to the nodes in the edge cluster. |
No |
replication-max-permits | 5000 | The maximum number of permits that are available for replication. Permits limit the number of concurrent operations that are submitted to the Cassandra driver. A permit must be acquired before a message can be sent to the hub. | Yes |
replication-min-permits | 100 | The minimum number of permits that are available for replication. | Yes |
replication-permit-timeout-ms | 2000 | If permits are not available, the maximum time the replication channel waits to acquire a permit. Applies only to messages sent by the slow track. | Yes |
replication-permits-tuning-delta-percentage | 20 | Specifies the time interval, in milliseconds, after which permits should be tuned again. | No |
replication-permits-tuning-window-ms | 60000 | Specifies the time interval, in milliseconds, after which permits should be tuned again. | No |
replication-timeouts-lower-bound-ratio | 3 | When the (timed out requests / total requests) ratio is below this percentage, the number of available permits is automatically increased by a delta. | No |
replication-timeouts-upper-bound-ratio | 15 | When the (timed out requests / total requests) ratio is above this percentage, the number of available permits is automatically decreased by a delta. | No |
replog-consumer-buffer | 100000 | The number of messages a single node will read at a time from the replication log. To increase performance, decrease this configuration value for less memory consumption at the cost of less replication performance. | Yes |
slow-track-min-bandwidth-percentage | 10 | The minimum guaranteed percent of permits that are available for sending messages by using the slow track. | Yes |
dse advrep edge list-confThe result is:
field | value
--------------------------------------------------
debug_output | true
driver_connections | 16
replog_consumer_buffer | 20000
replication_log_audit_log_enabled | true
rep_channel_permits | 30000
hub_ip_addresses | 10.200.241.156
remote_logging_ipaddress | 10.200.164.72
edge_replication | true
driver_connections_max | 256
Configuring channel settings
CREATE TABLE dse_system.advrep_table_conf (
keyspace_name text,
table_name text,
edge_id text,
edge_id_col_name text,
enabled boolean,
hub_keyspace_name text,
hub_table_name text,
priority int,
truncate_timestamp timestamp,
PRIMARY KEY (keyspace_name, table_name)
)
cqlsh> select * from dse_system.advrep_table_conf;The result is:
keyspace_name | table_name | edge_id | edge_id_col_name | enabled | hub_keyspace_name | hub_table_name | ...
---------------+------------+---------+------------------+---------+-------------------+----------------+-...
foo | bar | null | null | True | foo | bar | ...
(1 rows)
dse advrep edge channel ...You can provide authentication credentials in several ways, see Credentials for authentication.
Column name | Description |
---|---|
keyspace-name | The keyspace on the edge for the table to replicate. |
table-name | The table name on the edge to replicate. |
enabled | If true, replication will start for this table. If false, no more messages from this table will be saved to the replication log. |
hub-keyspace-name | The keyspace on the hub for the replicated table. |
hub-table-name | The table name on the hub for the replicated table. |
priority | Messages are marked by priority DESC. |
edge-id | Placeholder to override the edge-id that is defined in the advrep_conf metadata |
edge-id-col-name | Placeholder to override the edge-id-col-name that is defined in advrep_conf metadata. |
truncate-timestamp | Truncates data in the replication log for this channel that is older than this timestamp. Truncated data is not replicated. |
dse advrep edge channel statusThe results are:
keyspace_name | table_name | edge_id | edge_id_col_name | enabled | hub_keyspace_name | hub_table_name | priority | truncate_timestamp
--------------------------------------------------------------------------------------------------------------------------------------
advrep | loki | | | True | advrep | loki | 2 | None
advrep | thor | | | True | advrep | thor | 1 | None
Encrypting driver passwords
- In the dse.yaml file:
- Verify that the config_encryption_active property is
false:
config_encryption_active: false
- Enable driver password encryption with the
conf_driver_password_encryption_enabled
property:
conf_driver_password_encryption_enabled: true
- Define where system keys are stored on disk with the
system_key_directory
property:
The default value is /etc/dse/conf.system_key_directory: /etc/dse/conf
- Specify that encryption keys are generated as system keys with the
config_encryption_key_name
property:
config_encryption_key_name: system_key
- Verify that the config_encryption_active property is
false:
- Generate a system key:On-server:
dsetool createsystemkey cipher_algorithm strength system_key_file
Off-serverdsetool createsystemkey cipher_algorithm strength system_key_file -kmip=kmip_groupname
For example:dsetool createsystemkey 'AES/ECB/PKCS5Padding' 128 system_key_file
where system_key_file is a unique file name for the generated system key file. See createsystemkey.Result: You can create a global encryption key in the location that is specified by system_key_directory in the dse.yaml file. This default global encryption key is used when the system_key_file subproperty is not specified.
- Copy the returned value.
- On any node in the edge cluster, use the dse command to set the
encrypted password in the DSE Advanced Replication
environment:
dse advrep edge conf --driver-pwd "Sa9xOVaym7bddjXUT/eeOQ==" --driver-user "username"
- In dse.yaml, set the
conf_driver_password_encryption_enabled property to true:
conf_driver_password_encryption_enabled: true
- Start dse.
Data insert methods
There are several ways to get data into a DataStax Enterprise cluster. Because DSE Advanced Replication relies on Cassandra triggers, any data insert that goes through the normal path is supported. Methods that do not use Cassandra triggers do not cause data replication.
- CQL insert, including cqlsh and applications that use the standard Cassandra drivers
- Copy from a CSV file
- Solr HTTP or CQL
- Spark saveToCassandra
- Tables that are defined for compact storage
- sstableloader (Cassandra bulk loader)
- OpsCenter restore from backup
- Spark bulkSaveToCassandra
Installer-Services | /etc/dse/dse.yaml |
Package installations | /etc/dse/dse.yaml |
Installer-No Services | install_location/resources/dse/conf/dse.yaml |
Tarball installations | install_location/resources/dse/conf/dse.yaml |