Getting started with DSE Advanced Replication
Getting started steps to set up source and destination DataStax Enterprise clusters to test DSE Advanced Replication.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | installation_location/resources/cassandra/conf/cassandra.yaml |
To test Advanced Replication, you must set up an source cluster and a destination cluster. These steps set up one node in each cluster.
- Setting up the destination cluster node
- Setting up the source cluster
- Creating sample keyspace and table
- Configuring replication on the source node
- Creating the replication channel
- Starting replication from source to destination
- Inserting data on the source
- Testing loss of connectivity
- Testing replication start and stop
Setting up the destination cluster node
- Install DataStax Enterprise.
- Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation method.
- Note the public IP address for the destination node.
Setting up the source cluster
- Install DataStax Enterprise 5.1 or later.
- To enable replication, edit the dse.yaml
file.At the end of the file, uncomment the
advanced_replication_options
setting and options, and setenabled: true
.# Advanced Replication configuration settings advanced_replication_options: enabled: true
- Enable Capture-Data-Change (CDC) in the
cassandra.yaml file on a per-node basis for each
source:
cdc_enabled: true
Note: Advanced Replication will not start if CDC is not enabled, since CDC logs are used to implement the feature. - Consider increasing the default CDC disk space, depending on the load (default: 4096
or 1/8 of the total space where
cdc_raw_directory
resides):cdc_total_space_in_mb: 16384
- Commitlog compression is turned off by default. To avoid problems with advanced
replication, this option should NOT be used; ensure that the option is commented
out:
# commitlog_compression: # - class_name: LZ4Compressor
- Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation method.
- Once advanced replication is started on a cluster, the source node will create keyspaces and tables that need alteration. See Keyspaces for information.
Creating the sample keyspace and table
These steps show you how to create the demonstration keyspace and table.
- On the source node and the destination node, create the sample keyspace and
table:
CREATE KEYSPACE foo WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor':1};
Remember: Remember to use escaped quotes around keyspace and table names as command line arguments to preserve casing:dse advrep create --keyspace \"keyspaceName\" --table \"tableName\"
- On the source node:
CREATE TABLE foo.bar ( name TEXT, val TEXT, scalar INT, PRIMARY KEY (name));
- On the destination
node:
CREATE TABLE foo.bar ( name TEXT, val TEXT, scalar INT, source_id TEXT, PRIMARY KEY (name, source_id));
Note: Thesource_id
column is recommended as a column to include on the destination node. If the destination table has a field in the primary key that uniquely determines the source from which the data is replicated, thesource_id
is not required as part of the primary key. Thesource_id
column is useful for preventing overwrites if two records with the same primary key get replicated from different sources, and you want to keep both records.
Configuring a replication destination on the source node
DSE Advanced Replication stores all of its settings in CQL tables. To configure replication, use the dse advrep command line tool.
- The source node points to its destination using the public IP address that you saved earlier.
- The
source-id
value is a unique identifier for all data that comes from this particular source node. - The
source-id
unique identifier is written to thesource-id-column
that was included when the foo.bar table was created on the destination node.
dse advrep --verbose destination create --name mydest --addresses 10.200.182.148 --transmission-enabled true
Destination mydest created
To
verify the configuration, run this
command:dse advrep destination list-conf
--------------------------------------------------------------------------------------------
|destination|name |value |
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_enabled |false |
--------------------------------------------------------------------------------------------
|mydest |addresses |10.200.182.148 |
--------------------------------------------------------------------------------------------
|mydest |driver_read_timeout |15000 |
--------------------------------------------------------------------------------------------
|mydest |driver_connections_max |8 |
--------------------------------------------------------------------------------------------
|mydest |source_id_column |source_id |
--------------------------------------------------------------------------------------------
|mydest |driver_connect_timeout |15000 |
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_protocol |TLS |
--------------------------------------------------------------------------------------------
|mydest |driver_consistency_level |QUORUM |
--------------------------------------------------------------------------------------------
|mydest |driver_used_hosts_per_remote_dc |0 |
--------------------------------------------------------------------------------------------
|mydest |driver_allow_remote_dcs_for_local_cl|false |
--------------------------------------------------------------------------------------------
|mydest |driver_compression |lz4 |
--------------------------------------------------------------------------------------------
|mydest |driver_connections |1 |
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_cipher_suites |[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, |
| | |, |
| | |TLS_RSA_WITH_AES_256_CBC_SHA256, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_RSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_RSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_RSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_RSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_RSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA, |
| | |, |
| | |SSL_RSA_WITH_RC4_128_SHA, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA, |
| | |, |
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA, |
| | |, |
| | |SSL_RSA_WITH_RC4_128_MD5, |
| | |, |
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV] |
--------------------------------------------------------------------------------------------
|mydest |source_id |source1 |
--------------------------------------------------------------------------------------------
|mydest |transmission_enabled |true |
--------------------------------------------------------------------------------------------
Creating the replication channel
A replication channel is a defined channel of change data between source clusters and destination clusters. A replication channel is defined by the source cluster, source keyspace, source table name, destination cluster, destination keyspace, and destination table name. Source clusters can exists in multi-datacenter clusters, but a replication channel is configured with only one datacenter as the responsible party.The keyspace and table names on the destination can be
different than on the source, but in this example they are the same. You can also set the
source-id
and source-id-column
differently from the
global setting.
dse advrep channel create --source-keyspace foo --source-table bar --source-id source1 --source-id-column source_id --destination mydest --destination-keyspace foo --destination-table bar --collection-enabled true --transmission-enabled true --priority 1
Created channel dc=Cassandra keyspace=foo table=bar to mydest
dse advrep channel status
------------------------------------------------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|dest ks|dest table |src id |src id col|dest |dest enabled|
------------------------------------------------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |1 |foo |bar |source1|source_id |mydest|true |
------------------------------------------------------------------------------------------------------------------------------------------------------
durable_writes = false
, then an error message will occur and
the channel will not be created. If the durable writes setting is changed after the
replication channel is created, the tables will not write to the commit log and CDC will not
work. The data will not be ingested through the replication channel and a warning is logged,
but the failure will be silent.Starting replication from source to destination
- On the destination, use cqlsh to verify that no data is
present:
SELECT * FROM foo.bar;
name | source_id | scalar | val ------+---------+--------+----- (0 rows)
- On the source, replication to the destination can be paused or resumed, the latter
shown
here:
dse advrep channel resume --source-keyspace foo --source-table bar --transmission
Notice that eitherChannel dc=Cassandra keyspace=foo table=bar collection to mydest was resumed
--transmission
or--collection
can be specified, to resume transmission from the source to the destination or to resume collection of data on the source. - Review the number of records that are in the replication log. Because no data is
inserted yet, the record count in the replication log is
0:
dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
0
Inserting data on the source
- On the source, insert data using
cqlsh:
INSERT INTO foo.bar (name, val, scalar) VALUES ('a', '1', 1); INSERT INTO foo.bar (name, val, scalar) VALUES ('b', '2', 2);
- On the destination, verify that the data was
replicated:
SELECT * FROM foo.bar;
name | source_id | scalar | val ------+---------+--------+----- a | source1 | 1 | 1 b | source1 | 2 | 2 (2 rows)
Checking data on the destination
- On the destination, verify that the data was
replicated:
SELECT * FROM foo.bar;
name | source_id | scalar | val ------+---------+--------+----- a | source1 | 1 | 1 b | source1 | 2 | 2 (2 rows)
Testing loss of connectivity
- On the destination cluster, stop DataStax
Enterprise:
dse cassandra-stop
- On the source, insert more
data:
INSERT INTO foo.bar (name, val, scalar) VALUES ('c', '3', 3); INSERT INTO foo.bar (name, val, scalar) VALUES ('d', '4', 4);
- Review the number of records that are in the replication log. The replication log
should have 2
entries:
dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
2
- On the destination, restart DataStax Enterprise.
dse cassandra
Wait a moment for communication and data replication to resume to replicate the new records from the source to destination.SELECT * FROM foo.bar;
name | source_id | scalar | val ------+---------+--------+----- a | source1 | 1 | 1 c | source1 | 3 | 3 d | source1 | 4 | 4 b | source1 | 2 | 2 4 rows(s)
- On the source, the replication log count should be back to
0:
dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
0
Testing replication start and stop
- On the source, pause the replication
channel:
dse advrep --verbose channel pause --keyspace foo --table bar --collection
- Insert more data.
- On the source, resume the replication channel:
dse advrep --verbose channel resume --keyspace foo --table bar --collection