Getting started with DSE Advanced Replication
To test Advanced Replication, you must set up a source cluster and a destination cluster. These steps set up one node in each cluster.
Getting started overview:
Due to Cassandra-11368, list inserts might not be idempotent (unchanged). Because DSE Advanced Replication might deliver the same message to the destination more than once, this Cassandra bug might lead to data inconsistency if lists are used in a column family schema. DataStax recommends using other collection types, like sets or frozen lists, when ordering is not important. |
Prerequisite: If you are using Advanced Replication V1 from DSE 5.0, you must upgrade to DSE 5.1 and migrate to Advanced Replication V2.
Setting up the destination cluster node
The destination cluster requires DataStax Enterprise 4.8 or later. On the destination node:
-
Install DataStax Enterprise 4.8 or later.
-
Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation method.
-
Note the public IP address for the destination node.
Setting up the source cluster
The source cluster requires DataStax Enterprise 5.1 or later. On the source node:
-
Install DataStax Enterprise 5.1 or later.
-
To enable replication, edit the
dse.yaml
file.Where is the
dse.yaml
file?The location of the
dse.yaml
file depends on the type of installation:Installation Type Location Package installations + Installer-Services installations
/etc/dse/dse.yaml
Tarball installations + Installer-No Services installations
<installation_location>/resources/dse/conf/dse.yaml
At the end of the file, uncomment the
advanced_replication_options
setting and options, and setenabled: true
.# Advanced Replication configuration settings advanced_replication_options: enabled: true
-
Enable Capture-Data-Change (CDC) in the
cassandra.yaml
file on a per-node basis for each source:cdc_enabled: true
Where is the
cassandra.yaml
file?The location of the
cassandra.yaml
file depends on the type of installation:Installation Type Location Package installations + Installer-Services installations
/etc/dse/cassandra/cassandra.yaml
Tarball installations + Installer-No Services installations
<installation_location>/resources/cassandra/conf/cassandra.yaml
Advanced Replication does not start if CDC is not enabled, since CDC logs are used to implement the feature.
-
Consider increasing the default CDC disk space, depending on the load (default: 4096 or 1/8 of the total space where
cdc_raw_directory
resides):cdc_total_space_in_mb: 16384
-
Commitlog compression is turned off by default. To avoid problems with advanced replication, this option should NOT be used; ensure that the option is commented out:
# commitlog_compression: # - class_name: LZ4Compressor
-
Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation method.
-
Once advanced replication is started on a cluster, the source node creates keyspaces and tables that need alteration. See Keyspaces for information.
Creating the sample keyspace and table
These steps show you how to create the demonstration keyspace and table.
-
On the source node and the destination node, create the sample keyspace and table:
CREATE KEYSPACE foo WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor':1};
Remember to use escaped quotes around keyspace and table names as command line arguments to preserve casing:
dse advrep create --keyspace \"keyspaceName\" --table \"tableName\"
-
On the source node:
CREATE TABLE foo.bar ( name TEXT, val TEXT, scalar INT, PRIMARY KEY (name));
-
On the destination node:
CREATE TABLE foo.bar ( name TEXT, val TEXT, scalar INT, source_id TEXT, PRIMARY KEY (name, source_id));
The
source_id
column is recommended as a column to include on the destination node. If the destination table has a field in the primary key that uniquely determines the source from which the data is replicated, thesource_id
is not required as part of the primary key. Thesource_id
column is useful for preventing overwrites if two records with the same primary key get replicated from different sources, and you want to keep both records.
Configuring a replication destination on the source node
DSE Advanced Replication stores all of its settings in CQL tables. To configure replication, use the dse advrep command line tool.
When you configure replication on the source node:
-
The source node points to its destination using the public IP address that you saved earlier.
-
The
source-id
value is a unique identifier for all data that comes from this particular source node. -
The
source-id
unique identifier is written to thesource-id-column
that was included when the foo.bar table was created on the destination node.
To configure a replication destination, run this command:
$ dse advrep --verbose destination create --name mydest --addresses 10.200.182.148 --transmission-enabled true
Destination mydest created
To verify the configuration, run this command:
$ dse advrep destination list-conf
--------------------------------------------------------------------------------------------
|destination|name |value |
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_enabled |false |
--------------------------------------------------------------------------------------------
|mydest |addresses |10.200.182.148 |
--------------------------------------------------------------------------------------------
|mydest |driver_read_timeout |15000 |
--------------------------------------------------------------------------------------------
|mydest |driver_connections_max |8 |
--------------------------------------------------------------------------------------------
|mydest |source_id_column |source_id |
--------------------------------------------------------------------------------------------
|mydest |driver_connect_timeout |15000 |
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_protocol |TLS |
--------------------------------------------------------------------------------------------
|mydest |driver_consistency_level |QUORUM |
--------------------------------------------------------------------------------------------
|mydest |driver_used_hosts_per_remote_dc |0 |
--------------------------------------------------------------------------------------------
|mydest |driver_allow_remote_dcs_for_local_cl|false |
--------------------------------------------------------------------------------------------
|mydest |driver_compression |lz4 |
--------------------------------------------------------------------------------------------
|mydest |driver_connections |1 |
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_cipher_suites |[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, |
| | |, |
| | |TLS_RSA_WITH_AES_256_CBC_SHA256, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_RSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_RSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_RSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_RSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_RSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA, |
| | |, |
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA, |
| | |, |
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA, |
| | |, |
| | |SSL_RSA_WITH_RC4_128_SHA, |
| | |, |
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA, |
| | |, |
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA, |
| | |, |
| | |SSL_RSA_WITH_RC4_128_MD5, |
| | |, |
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV] |
--------------------------------------------------------------------------------------------
|mydest |source_id |source1 |
--------------------------------------------------------------------------------------------
|mydest |transmission_enabled |true |
--------------------------------------------------------------------------------------------
Creating the replication channel
A replication channel is a defined channel of change data between source clusters and destination clusters. A replication channel is defined by the source cluster, source keyspace, source table name, destination cluster, destination keyspace, and destination table name. Source clusters can exists in multi-datacenter clusters, but a replication channel is configured with only one datacenter as the responsible party.
The keyspace and table names on the destination can be different than on the source, but in this example they are the same.
You can also set the source-id
and source-id-column
differently from the global setting.
To create the replication channel for our keyspace and table:
$ dse advrep channel create --source-keyspace foo --source-table bar --source-id source1 --source-id-column source_id --destination mydest --destination-keyspace foo --destination-table bar --collection-enabled true --transmission-enabled true --priority 1
Created channel dc=Cassandra keyspace=foo table=bar to mydest
$ dse advrep channel status
------------------------------------------------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|dest ks|dest table |src id |src id col|dest |dest enabled|
------------------------------------------------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |1 |foo |bar |source1|source_id |mydest|true |
------------------------------------------------------------------------------------------------------------------------------------------------------
The designated keyspace for a replication channel must have durable writes enabled.
If |
Starting replication from source to destination
At this point, the replication is configured and the replication channel is enabled and replication has been started.
-
On the destination, use cqlsh to verify that no data is present:
SELECT * FROM foo.bar;
name | source_id | scalar | val ------+---------+--------+----- (0 rows)
-
On the source, replication to the destination can be paused or resumed, the latter shown here:
$ dse advrep channel resume --source-keyspace foo --source-table bar --transmission
Channel dc=Cassandra keyspace=foo table=bar collection to mydest was resumed
Notice that either
--transmission
or--collection
can be specified, to resume transmission from the source to the destination or to resume collection of data on the source. -
Review the number of records that are in the replication log. Because no data is inserted yet, the record count in the replication log is 0:
$ dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
0
Inserting data on the source
Insert data on the source for replication to the destination.
-
On the source, insert data using cqlsh:
INSERT INTO foo.bar (name, val, scalar) VALUES ('a', '1', 1); INSERT INTO foo.bar (name, val, scalar) VALUES ('b', '2', 2);
-
On the destination, verify that the data was replicated:
SELECT * FROM foo.bar;
name | source_id | scalar | val ------+---------+--------+----- a | source1 | 1 | 1 b | source1 | 2 | 2 (2 rows)
Checking data on the destination
Check data on the destination.
-
On the destination, verify that the data was replicated:
SELECT * FROM foo.bar;
name | source_id | scalar | val ------+---------+--------+----- a | source1 | 1 | 1 b | source1 | 2 | 2 (2 rows)
Testing loss of connectivity
To test loss of connectivity to the destination, stop the DataStax Enterprise process on the destination, and insert more data on the source. The expected result is for data to be replicated quickly after the destination cluster resumes.
-
On the destination cluster, stop DataStax Enterprise:
$ dse cassandra-stop
-
On the source, insert more data:
INSERT INTO foo.bar (name, val, scalar) VALUES ('c', '3', 3); INSERT INTO foo.bar (name, val, scalar) VALUES ('d', '4', 4);
-
Review the number of records that are in the replication log. The replication log should have 2 entries:
$ dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
2
-
On the destination, restart DataStax Enterprise.
$ dse cassandra
Wait a moment for communication and data replication to resume to replicate the new records from the source to destination.
SELECT * FROM foo.bar;
name | source_id | scalar | val ------+---------+--------+----- a | source1 | 1 | 1 c | source1 | 3 | 3 d | source1 | 4 | 4 b | source1 | 2 | 2 4 rows(s)
-
On the source, the replication log count should be back to 0:
$ dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
0
Testing replication start and stop
Similar to testing loss of connectivity, you can pause and resume individual replication channels by using the advrep command line tool. The expected result is that newly inserted data is not saved to the replication log and is never sent to the destination.
-
On the source, pause the replication channel:
$ dse advrep --verbose channel pause --keyspace foo --table bar --collection
-
Insert more data.
-
On the source, resume the replication channel:
$ dse advrep --verbose channel resume --keyspace foo --table bar --collection