Getting started with DSE Advanced Replication

Getting started steps to set up source and destination DataStax Enterprise clusters to test DSE Advanced Replication.

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

To test Advanced Replication, you must set up an source cluster and a destination cluster. These steps set up one node in each cluster.

Getting started overview:
  1. Setting up the destination cluster node
  2. Setting up the source cluster
  3. Creating sample keyspace and table
  4. Configuring replication on the source node
  5. Creating the replication channel
  6. Starting replication from source to destination
  7. Inserting data on the source
  8. Testing loss of connectivity
  9. Testing replication start and stop
Note: Due to Cassandra-11368, list inserts might not be idempotent (unchanged). Because DSE Advanced Replication might deliver the same message to the destination more than once, this Cassandra bug might lead to data inconsistency if lists are used in a column family schema. DataStax recommends using other collection types, like sets or frozen lists, when ordering is not important.

Setting up the destination cluster node

Attention: Prerequisite: If you are using Advanced Replication V1 from DSE 5.0, you must upgrade to DSE 5.1 and migrate to Advanced Replication V2.
On the destination node:
  1. Install DataStax Enterprise.
  2. Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation method.
  3. Note the public IP address for the destination node.

Setting up the source cluster

Advanced replication can operate in a mixed-version environment. The source cluster requires DataStax Enterprise 5.1 or later. On the source node:
  1. Install DataStax Enterprise 5.1 or later.
  2. To enable replication, edit the dse.yaml file.
    At the end of the file, uncomment the advanced_replication_options setting and options, and set enabled: true.
    # Advanced Replication configuration settings
    advanced_replication_options:
        enabled: true
  3. Enable Capture-Data-Change (CDC) in the cassandra.yaml file on a per-node basis for each source:
    cdc_enabled: true
    Note: Advanced Replication will not start if CDC is not enabled, since CDC logs are used to implement the feature.
  4. Consider increasing the default CDC disk space, depending on the load (default: 4096 or 1/8 of the total space where cdc_raw_directory resides):
    cdc_total_space_in_mb: 16384
  5. Commitlog compression is turned off by default. To avoid problems with advanced replication, this option should NOT be used; ensure that the option is commented out:
    # commitlog_compression:
    #   - class_name: LZ4Compressor
  6. Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation method.
  7. Once advanced replication is started on a cluster, the source node will create keyspaces and tables that need alteration. See Keyspaces for information.

Creating the sample keyspace and table

These steps show you how to create the demonstration keyspace and table.

  1. On the source node and the destination node, create the sample keyspace and table:
    CREATE KEYSPACE foo 
    WITH REPLICATION = {
       'class': 'SimpleStrategy', 
       'replication_factor':1};
    Remember: Remember to use escaped quotes around keyspace and table names as command line arguments to preserve casing: dse advrep create --keyspace \"keyspaceName\" --table \"tableName\"
  2. On the source node:
    CREATE TABLE foo.bar (
       name TEXT, 
       val TEXT, 
       scalar INT, 
       PRIMARY KEY (name));
  3. On the destination node:
    CREATE TABLE foo.bar (
       name TEXT, 
       val TEXT, 
       scalar INT, 
       source_id TEXT, 
       PRIMARY KEY (name, source_id));
    Note: The source_id column is recommended as a column to include on the destination node. If the destination table has a field in the primary key that uniquely determines the source from which the data is replicated, the source_id is not required as part of the primary key. The source_id column is useful for preventing overwrites if two records with the same primary key get replicated from different sources, and you want to keep both records.

Configuring a replication destination on the source node

DSE Advanced Replication stores all of its settings in CQL tables. To configure replication, use the dse advrep command line tool.

When you configure replication on the source node:
  • The source node points to its destination using the public IP address that you saved earlier.
  • The source-id value is a unique identifier for all data that comes from this particular source node.
  • The source-id unique identifier is written to the source-id-column that was included when the foo.bar table was created on the destination node.
To configure a replication destination, run this command:
dse advrep --verbose destination create --name mydest --addresses 10.200.182.148 --transmission-enabled true
Destination mydest created
To verify the configuration, run this command:
dse advrep destination list-conf
--------------------------------------------------------------------------------------------
|destination|name                                |value                                    |
--------------------------------------------------------------------------------------------
|mydest     |driver_ssl_enabled                  |false                                    |
--------------------------------------------------------------------------------------------
|mydest     |addresses                           |10.200.182.148                           |
--------------------------------------------------------------------------------------------
|mydest     |driver_read_timeout                 |15000                                    |
--------------------------------------------------------------------------------------------
|mydest     |driver_connections_max              |8                                        |
--------------------------------------------------------------------------------------------
|mydest     |source_id_column                    |source_id                                  |
--------------------------------------------------------------------------------------------
|mydest     |driver_connect_timeout              |15000                                    |
--------------------------------------------------------------------------------------------
|mydest     |driver_ssl_protocol                 |TLS                                      |
--------------------------------------------------------------------------------------------
|mydest     |driver_consistency_level            |QUORUM                                   |
--------------------------------------------------------------------------------------------
|mydest     |driver_used_hosts_per_remote_dc     |0                                        |
--------------------------------------------------------------------------------------------
|mydest     |driver_allow_remote_dcs_for_local_cl|false                                    |
--------------------------------------------------------------------------------------------
|mydest     |driver_compression                  |lz4                                      |
--------------------------------------------------------------------------------------------
|mydest     |driver_connections                  |1                                        |
--------------------------------------------------------------------------------------------
|mydest     |driver_ssl_cipher_suites            |[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,   |
|           |                                    |,                                        |
|           |                                    |TLS_RSA_WITH_AES_256_CBC_SHA256,         |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,  |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,    |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,     |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,     |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,    |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,      |
|           |                                    |,                                        |
|           |                                    |TLS_RSA_WITH_AES_256_CBC_SHA,            |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,     |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,       |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,        |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,        |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,   |
|           |                                    |,                                        |
|           |                                    |TLS_RSA_WITH_AES_128_CBC_SHA256,         |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,  |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,    |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,     |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,     |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,    |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,      |
|           |                                    |,                                        |
|           |                                    |TLS_RSA_WITH_AES_128_CBC_SHA,            |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,     |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,       |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,        |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,        |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,   |
|           |                                    |,                                        |
|           |                                    |TLS_RSA_WITH_AES_256_GCM_SHA384,         |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,  |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,    |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,     |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,     |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,   |
|           |                                    |,                                        |
|           |                                    |TLS_RSA_WITH_AES_128_GCM_SHA256,         |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,  |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,    |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,     |
|           |                                    |,                                        |
|           |                                    |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,     |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,   |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,     |
|           |                                    |,                                        |
|           |                                    |SSL_RSA_WITH_3DES_EDE_CBC_SHA,           |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,    |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,      |
|           |                                    |,                                        |
|           |                                    |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,       |
|           |                                    |,                                        |
|           |                                    |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,       |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,        |
|           |                                    |,                                        |
|           |                                    |TLS_ECDHE_RSA_WITH_RC4_128_SHA,          |
|           |                                    |,                                        |
|           |                                    |SSL_RSA_WITH_RC4_128_SHA,                |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,         |
|           |                                    |,                                        |
|           |                                    |TLS_ECDH_RSA_WITH_RC4_128_SHA,           |
|           |                                    |,                                        |
|           |                                    |SSL_RSA_WITH_RC4_128_MD5,                |
|           |                                    |,                                        |
|           |                                    |TLS_EMPTY_RENEGOTIATION_INFO_SCSV]       |
--------------------------------------------------------------------------------------------
|mydest     |source_id                           |source1                                  |
--------------------------------------------------------------------------------------------
|mydest     |transmission_enabled                |true                                     |
--------------------------------------------------------------------------------------------

Creating the replication channel

A replication channel is a defined channel of change data between source clusters and destination clusters. A replication channel is defined by the source cluster, source keyspace, source table name, destination cluster, destination keyspace, and destination table name. Source clusters can exists in multi-datacenter clusters, but a replication channel is configured with only one datacenter as the responsible party.

The keyspace and table names on the destination can be different than on the source, but in this example they are the same. You can also set the source-id and source-id-column differently from the global setting.

To create the replication channel for our keyspace and table:
dse advrep channel create --source-keyspace foo --source-table bar --source-id source1 --source-id-column source_id --destination mydest --destination-keyspace foo --destination-table bar --collection-enabled true --transmission-enabled true --priority 1
Created channel dc=Cassandra keyspace=foo table=bar to mydest
dse advrep channel status
------------------------------------------------------------------------------------------------------------------------------------------------------
|dc       |keyspace|table          |collecting|transmitting|replication order|priority|dest ks|dest table     |src id |src id col|dest  |dest enabled|
------------------------------------------------------------------------------------------------------------------------------------------------------
|Cassandra|foo     |bar            |true      |true       |FIFO             |1       |foo    |bar            |source1|source_id |mydest|true        |
------------------------------------------------------------------------------------------------------------------------------------------------------
Warning: The designated keyspace for a replication channel must have durable writes enabled. If durable_writes = false, then an error message will occur and the channel will not be created. If the durable writes setting is changed after the replication channel is created, the tables will not write to the commit log and CDC will not work. The data will not be ingested through the replication channel and a warning is logged, but the failure will be silent.

Starting replication from source to destination

At this point, the replication is configured and the replication channel is enabled and replication has been started.
  1. On the destination, use cqlsh to verify that no data is present:
    SELECT * FROM foo.bar;
    name | source_id | scalar | val
    ------+---------+--------+-----
    (0 rows)
  2. On the source, replication to the destination can be paused or resumed, the latter shown here:
    dse advrep channel resume --source-keyspace foo --source-table bar --transmission
    Channel dc=Cassandra keyspace=foo table=bar  collection to mydest was resumed
    Notice that either --transmission or --collection can be specified, to resume transmission from the source to the destination or to resume collection of data on the source.
  3. Review the number of records that are in the replication log. Because no data is inserted yet, the record count in the replication log is 0:
    dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
    0

Inserting data on the source

Insert data on the source for replication to the destination.
  1. On the source, insert data using cqlsh:
    INSERT INTO foo.bar (name, val, scalar) VALUES ('a', '1', 1);
    INSERT INTO foo.bar (name, val, scalar) VALUES ('b', '2', 2);
  2. On the destination, verify that the data was replicated:
    SELECT * FROM foo.bar;
    name  | source_id | scalar | val
    ------+---------+--------+-----
    a     | source1   | 1      | 1
    b     | source1   | 2      | 2
    (2 rows)

Checking data on the destination

Check data on the destination.
  1. On the destination, verify that the data was replicated:
    SELECT * FROM foo.bar;
    name  | source_id | scalar | val
    ------+---------+--------+-----
    a     | source1   | 1      | 1
    b     | source1   | 2      | 2
    (2 rows)

Testing loss of connectivity

To test loss of connectivity to the destination, stop the DataStax Enterprise process on the destination, and insert more data on the source. The expected result is for data to be replicated quickly after the destination cluster resumes.
  1. On the destination cluster, stop DataStax Enterprise:
    dse cassandra-stop
  2. On the source, insert more data:
    INSERT INTO foo.bar (name, val, scalar) VALUES ('c', '3', 3);
    INSERT INTO foo.bar (name, val, scalar) VALUES ('d', '4', 4);
  3. Review the number of records that are in the replication log. The replication log should have 2 entries:
    dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
    2
  4. On the destination, restart DataStax Enterprise.
    dse cassandra
    Wait a moment for communication and data replication to resume to replicate the new records from the source to destination.
    SELECT * FROM foo.bar;
    name  | source_id | scalar | val
    ------+---------+--------+-----
    a     | source1   | 1      | 1
    c     | source1   | 3      | 3
    d     | source1   | 4      | 4
    b     | source1   | 2      | 2
    4 rows(s)
  5. On the source, the replication log count should be back to 0:
    dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
    0

Testing replication start and stop

Similar to testing loss of connectivity, you can pause and resume individual replication channels by using the advrep command line tool. The expected result is that newly inserted data is not saved to the replication log and will never be sent to the destination.
  1. On the source, pause the replication channel:
    dse advrep --verbose channel pause --keyspace foo --table bar --collection
  2. Insert more data.
  3. On the source, resume the replication channel:
    dse advrep --verbose channel resume --keyspace foo --table bar --collection