Managing invalid messages

DSE Advanced Replication strategies for managing invalid messages when replication fails.

During message replication, DSE Advanced Replicates attempts to manipulate the message to ensure successful replication. In some cases, replication might occur with only a subset of the data.

In other cases, replication fails when there are too many differences between the schema on the edge cluster and the schema on the hub cluster. For example, schema incompatibilities occur when a column in the hub has a different type than the same column in the edge, or a table in the edge doesn’t contain all the columns that form the primary key of the same table in the hub.

When a message cannot be replicated, you can configure the maximum number of retries. If replication still fails after that maximum is reached, the message is discarded and removed from the replication log. The replication log on the source cluster stores data in preparation for transmission to the destination cluster.

When a message is discarded, the CQL query string and the related error message are logged on the hub cluster. To define where to store the CQL strings and the error messages that are relevant to the failed message replication, use one of the following logging strategies:

SYSTEM_LOG: Log the CQL query and the error message in the system log on the hub. This is the default value.
TABLE_LOG: Store the CQL query and the error message in the dse_advrep.advrep_invalid_messages_log Cassandra table on the hub.
NONE: Perform no logging.

For the table logging strategy, a record is inserted for each invalid message. The record stores the following data that is relevant to the failed message replication:

keyspace_name: keyspace name of the invalid query
table_name: table name of the invalid query
time_bucket: an hourly time bucket to prevent the Cassandra partition from getting too wide
id: a time based id (timeuuid)
cql_string: the CQL query string, explicitly specifies the original timestamp by including the USING TIMESTAMP option.
error_msg: the error message

Invalid messages are sorted by time in the log table.

Global settings apply to the entire edge cluster. These global settings are stored in the CQL table dse_system.advrep_conf that is automatically created. To define configuration keys to change global settings, you can the dse advrep command line tool or write directly to the CQL table. No action is required to keep using the default system log strategy.

Procedure

To manage invalid messages using the table logging strategy:

To change the maximum number of times to retry an invalid message before discarding it and removing it from the replication log, specify a value for the invalid_message_max_retries configuration key:
```
dse advrep edge conf invalid_message_max_retries 20
```
To store the CQL query string and error message in a table, instead of the default system log location, specify the invalid_message_log configuration key as TABLE_LOG:
```
dse advrep edge conf invalid_message_log TABLE_LOG
```
To identify the problem, examine the error messages, the CQL query strings, and the schemas of the data on the edge and the hub.
Take appropriate actions to resolve the incompatibility issues.
Replay the invalid messages. Because the invalid messages are sorted by time stamp in the dse_advrep.advrep_invalid_messages_log Cassandra table, it makes sense to replay the messages in order.
Counters are not supported. Although the CQL string specifies the original timestamp, you can replay the messages out of order because the CQL queries are idempotent and remain unchanged.
1. Read the records of interest from the log table: filter by keyspace, table, and time bucket.
2. Extract the CQL query string from the cql_string column of each record.
3. Run the extracted CQL queries against the hub.
4. After the queries succeed, remove the records from the log table.