Getting started with DSE Advanced Replication

Getting started steps to set up edge and hub DataStax Enterprise clusters to test DSE Advanced Replication.

To test Advanced Replication, you must set up an edge cluster and a hub cluster. These steps set up one node in each cluster.

Getting started overview:
  1. Setting up the hub cluster node
  2. Setting up the edge cluster
  3. Creating sample keyspace and table
  4. Configuring replication on the edge node
  5. Creating the replication channel
  6. Starting replication from edge to hub
  7. Inserting data on the edge
  8. Testing loss of connectivity
  9. Testing replication start and stop
Note: Due to Cassandra-11368, list inserts might not be idempotent (unchanged). Because DSE Advanced Replication might deliver the same message to the hub more than once, this Cassandra bug might lead to data inconsistency if lists are used in a column family schema. DataStax recommends using other collection types, like sets when ordering is not important, or frozen lists.

Setting up the hub cluster node 

The hub cluster requires DataStax Enterprise 4.8 or later. On the hub node:
  1. Install DataStax Enterprise 5.0 or 4.8 or later.
  2. Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation method.
  3. Note the public IP address for the hub node.

Setting up the edge cluster 

The edge cluster requires DataStax Enterprise 5.0 or later. On the edge node:
  1. Install DataStax Enterprise 5.0 or later.
  2. To enable replication, edit the dse.yaml file.
    At the end of the file, uncomment the advanced_replication_options setting and options, and set enabled: true.
    # Advanced Replication configuration settings
    advanced_replication_options:
        enabled: true
  3. Once advanced replication is started on a cluster, the edge node will create keyspaces and tables that need alteration. See Keyspaces for information.

Creating the sample keyspace and table 

These steps show you how to create the demonstration keyspace and table.

  1. On the edge node and the hub node, create the sample keyspace and table:
    cqlsh> create keyspace foo with replication = {'class': 'SimpleStrategy', 'replication_factor':1};
  2. On the edge node:
    cqlsh> create table foo.bar (name text, val text, scalar int, primary key (name));
  3. On the hub node:
    cqlsh> create table foo.bar (name text, val text, scalar int, edge_id text, primary key (name, edge_id));
    Note: The edge_id column is required on the hub node.

Configuring replication on the edge node 

DSE Advanced Replication stores all of its settings in CQL tables. To configure replication, load values directly to the tables, or use the dse advrep command line tool.

When you configure replication on the edge node:
  • The edge node points to its hub using the public IP address that you saved earlier.
  • The edge-id value is a unique identifier for all data that comes from this particular edge node.
  • The edge-id unique identifier is written to the edge-id-col-name that was included when the foo.bar table was created on the hub node.
To configure replication, run this command:
dse advrep edge conf --edge-id "edge1" --edge-id-col-name "edge_id" --hub-ip-addresses "10.200.177.184"
Set replication config edge_id_col_name from None to edge_id
Set replication config hub_ip_addresses from None to 10.200.177.184
Set replication config edge_id from None to edge1
To verify the configuration, run this command:
dse advrep edge list-conf
field            | value
--------------------------------
edge_id          | edge1
hub_ip_addresses | 10.200.181.55
edge_id_col_name | edge_id

Creating the replication channel 

A replication channel is a defined channel of change data between edge clusters and hub clusters.

The keyspace and table names on the hub can be different than on the edge, but in this example they are the same. You can also set the edge-id and edge-id-col-name differently from the global setting.

To create the replication channel for our keyspace and table:
dse advrep edge channel create --keyspace foo --table bar --hub-keyspace foo --hub-table bar --priority 1 --edge-id edge1 --edge-id-col-name edge_id --enabled
Created Replication Channel foo:bar
dse advrep edge channel status
keyspace_name | table_name | edge_id | edge_id_col_name | enabled | hub_keyspace_name | hub_table_name | priority | truncate_timestamp
--------------------------------------------------------------------------------------------------------------------------------------
foo           | bar        | edge1   | edge_id          | True    | foo               | bar            | 1        | None

Starting replication from edge to hub 

At this point, the replication is configured and the replication channel is enabled. Nothing happens until replication is started.
  1. On the hub, use cqlsh to verify that no data is present:
    cqlsh:foo> select * from foo.bar;
    name | edge_id | scalar | val
    ------+---------+--------+-----
    (0 rows)
  2. On the edge, start replication to the hub:
    dse advrep edge start
    edge replication started
    dse advrep edge status
    edge replication is running
  3. Review the number of records that are in the replication log. Because no data is inserted yet, the record count in the replication log is 0:
    dse advrep edge rl-count
    0

Inserting data on the edge 

Insert data on the edge for replication to the hub.
  1. On the edge, insert data using cqlsh:
    cqlsh> insert into foo.bar (name, val, scalar) values ('a', '1', 1);
    cqlsh> insert into foo.bar (name, val, scalar) values ('b', '2', 2);
  2. On the hub, verify that the data was replicated:
    cqlsh:foo> select * from foo.bar;
    name  | edge_id | scalar | val
    ------+---------+--------+-----
    a     | edge1   | 1      | 1
    b     | edge1   | 2      | 2
    (2 rows)

Testing loss of connectivity 

To test loss of connectivity to the hub, stop the DataStax Enterprise process on the hub, and insert more data on the edge. The expected result is for data to be replicated quickly after the hub cluster resumes.
  1. On the hub, stop DataStax Enterprise:
    dse cassandra-stop
  2. On the edge, insert more data:
    cqlsh> insert into foo.bar (name, val, scalar) values ('c', '3', 3);
    cqlsh> insert into foo.bar (name, val, scalar) values ('d', '4', 4);
  3. Review the number of records that are in the replication log. The replication log should have 2 entries:
    dse advrep edge rl-count
    2
  4. On the hub, restart DataStax Enterprise.
    dse cassandra

    Wait a moment for communication and data replication to resume to replicate the new records from the edge to the hub.

    cqlsh> select * from foo.bar;
    name  | edge_id | scalar | val
    ------+---------+--------+-----
    a     | edge1   | 1      | 1
    c     | edge1   | 3      | 3
    d     | edge1   | 4      | 4
    b     | edge1   | 2      | 2
    4 rows(s)
  5. On the edge, the replication log count should be back to 0:
    dse advrep edge rl-count
    
    0

Testing replication start and stop 

Similar to testing loss of connectivity, you can pause and resume individual replication channels by using the advrep command line tool. The expected result is that newly inserted data is not saved to the replication log and will never be sent to the hub.
  1. On the edge, pause the replication channel:
    dse advrep edge channel pause --keyspace foo --table bar
  2. Insert more data.
  3. On the edge, resume the replication channel:
    dse advrep edge channel resume --keyspace foo --table bar
We can also stop and start all replication on the hub. The expected result is that newly inserted data should be saved to the replication log when replication is stopped, and sent to the hub after replication is restarted.
  1. On the hub, stop replication:
    dse advrep edge stop
  2. Insert more data.
  3. On the hub, start replication:
    dse advrep edge start
The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml