CQL queries in DSE Advanced Replication

Supported CQL queries and best practices guidelines for running queries on edge and hub clusters in DSE Advanced Replication.

This overview of supported CQL queries and replication concepts for DSE Advanced Replication provide details on supported CQL queries and best practices guidelines. You must follow these best practices to achieve consistent replication.

DSE Advanced Replication replicates data from edge clusters to hub clusters. Replication takes the CQL query on the edge and then recreates a modified version of the query and runs it on the hub. DataStax Enterprise supports a restricted list of valid CQL queries to manipulate data. In DSE Advanced Replication, the same restrictions apply to the generated CQL queries that are used to replicate data into the hub.

Restrictions apply to the primary key. The primary key consists of two parts: the partition key and the clustering key. The primary key parts plus the optional field values comprise the database row.

If differences exist between the primary key on the edge table and the primary key on the hub table, restrictions apply for which CQL queries are supported.

Best practices

DataStax recommends the following best practices to ensure seamless replication.

Schema structure on the edge tables and the hub tables

Maintain an identical primary key (partition keys and clustering keys) format in the same order, with the same columns.
Add the optional edge_id as the first clustering column.
Maintain all, or a subset of, the field values.

Note: Although the edge_id column can be present in the table schema on the edge, values that are inserted into that column are ignored. When records are replicated, the configured edge-id value is used.

Partition key columns

The following list details support and restrictions for partition keys:

On the hub, only an additional optional edge_id column is supported in the partition key. Additional hub partition key columns are not supported. The edge_id can be either a clustering column or a partition key, but not both.
Using a subset of edge partition key columns on the hub might result in overwriting. There is a many-to-one mapping for row entries.
Order is irrelevant for replication. All permutations are supported.
CQL UPDATE queries require that all of the partition key columns are fully restricted. Restrict partition key columns using = or IN (single column) restrictions.
CQL DELETE queries require that all of the partition key columns are fully restricted. Restrict partition key columns using = or IN (single column) restrictions.

Clustering columns

The following list details support and restrictions for clustering columns:

On the hub, only an additional optional edge_id column is supported in the clustering column. Additional hub partition key columns are not supported. The edge_id can be either a clustering column or a partition key, but not both.
Using a subset of edge clustering columns on the hub might result in overwriting. There is a many-to-one mapping for row entries.
Order is irrelevant for replication when using CQL INSERT and UPDATE queries. All permutations are supported.
Order is relevant for replication when using CQL DELETE queries. There are limits to permutation support, all permutations are not supported.
CQL UPDATE queries require that all of the clustering columns are fully restricted. Restrict partition key columns using = or IN (single column) restrictions.
CQL DELETE queries require that the last-specified clustering column be restricted using =/>/>=/</<= (single or multiple column) or IN (single or multiple column). All of the clustering columns that precede the last-specified clustering column must also be restricted using = or IN.
Restricting clustering columns is optional. However, if you do restrict clustering columns, then all of the clustering columns that you restrict between the first and last (in order) clustering columns must be restricted.

Field values

The following list details support and requirements for field values:

A subset, or all, of the field values on the edge are supported for replication to the hub.
Fields that are present on the edge, but absent on the hub, are not replicated.
Fields that are present on the hub, but absent on the edge, are not populated.

Edge ID (edge_id)

The edge_id identifies the edge cluster and all inserts from the edge cluster. The following list details support and requirements for the edge_id:

The edge_id configuration key must be present and correct in the metadata.
The edge_id must be the first position in the clustering column, or any of the partition keys.
If not, then the CQL INSERT and UPDATE queries should work, but the CQL DELETE queries with partially restricted clustering columns might fail.
The edge_id is always restricted in CQL DELETE and UPDATE queries. Certain delete statements are not supported where the clustering key is not fully restricted, and the edge_id is not the first clustering column

Tip: Follow best practice guidelines for consistent replication. For example, the following is unsupported for DSE Advanced Replication:

PRIMARY KEY ((location, cell, testdate, model, testnumber, reading, scan, testtype, readingtype, payloadtype, datatype), edge_id, measurementid)) WITH CLUSTERING ORDER BY (measurementid DESC)

By using the WITH CLUSTERING ORDER BY (measurementid) the order of the clustering columns is changed because measurementid precedes edge_id.

To use best practices, do one of the following:

Restrict measurementid
Remove the with clustering order clause
Change the clustering order to edge_id