CQL queries in DSE Advanced Replication

Supported CQL queries and best practices guidelines for running queries on source and destination clusters in DSE Advanced Replication.

This overview of supported CQL queries and replication concepts for DSE Advanced Replication provide details on supported CQL queries and best practices guidelines.

DSE Advanced Replication replicates data from source clusters to destination clusters. Replication takes the CQL query on the source and then recreates a modified version of the query and runs it on the destination. DataStax Enterprise supports a restricted list of valid CQL queries to manipulate data. In DSE Advanced Replication, the same restrictions apply to the generated CQL queries that are used to replicate data into the destination.

Restrictions apply to the primary key. The primary key consists of two parts: the partition key and the clustering key. The primary key parts plus the optional field values comprise the database row.

If differences exist between the primary key on the source table and the primary key on the destination table, restrictions apply for which CQL queries are supported.

Best practices

DataStax recommends the following best practices to ensure seamless replication.

Schema structure on the source table and the destination table

Maintain an identical primary key (partition keys and clustering keys) format in the same order, with the same columns.
Add the optional source_id as the first clustering column.
Maintain all, or a subset of, the field values.

Note: Although the source_id column can be present in the source table schema, values that are inserted into that column are ignored. When records are replicated, the configured source-id value is used.

Partition key columns

The following list details support and restrictions for partition keys:

In the destination table, only an additional optional source_id column is supported in the partition key. Additional destination table partition key columns are not supported. The source_id can be either a clustering column or a partition key, but not both.
Using a subset of source table partition key columns in the destination table might result in overwriting. There is a many-to-one mapping for row entries.
Order is irrelevant for replication. All permutations are supported.
CQL UPDATE queries require that all of the partition key columns are fully restricted. Restrict partition key columns using = or IN (single column) restrictions.
CQL DELETE queries require that all of the partition key columns are fully restricted. Restrict partition key columns using = or IN (single column) restrictions.

Clustering columns

The following list details support and restrictions for clustering columns:

In the destination table, only an additional optional source_id column is supported in the clustering column. Additional destination table partition key columns are not supported. The source_id can be either a clustering column or a partition key, but not both.
Using a subset of source table clustering columns in the destination table might result in overwriting. There is a many-to-one mapping for row entries.
Order is irrelevant for replication when using CQL INSERT and UPDATE queries. All permutations are supported.
Order is relevant for replication when using CQL DELETE queries. There are limits to permutation support, all permutations are not supported.
CQL UPDATE queries require that all of the clustering columns are fully restricted. Restrict partition key columns using = or IN (single column) restrictions.
CQL DELETE queries require that the last-specified clustering column be restricted using =/>/>=/</<= (single or multiple column) or IN (single or multiple column). All of the clustering columns that precede the last-specified clustering column must also be restricted using = or IN.
Restricting clustering columns is optional. However, if you do restrict clustering columns, then all of the clustering columns that you restrict between the first and last (in order) clustering columns must be restricted.

Field values

The following list details support and requirements for field values:

A subset, or all, of the field values on the source are supported for replication to the destination.
Fields that are present on the source, but absent on the destination, are not replicated.
Fields that are present on the destination, but absent on the source, are not populated.

Source ID (source_id)

The source_id identifies the source cluster and all inserts from the source cluster. The following list details support and requirements for the source_id:

The source_id configuration key must be present and correct in the metadata.
The source_id must be the first position in the clustering column, or any of the partition keys.
If not, then the CQL INSERT and UPDATE queries should work, but the CQL DELETE queries with partially restricted clustering columns might fail.
The source_id is always restricted in CQL DELETE and UPDATE queries. Certain delete statements are not supported where the clustering key is not fully restricted, and the source_id is not the first clustering column