CQL queries in DSE Advanced Replication
This overview of supported CQL queries and replication concepts for DSE Advanced Replication provide details on supported CQL queries and best practices guidelines.
DSE Advanced Replication replicates data from source clusters to destination clusters. Replication takes the CQL query on the source and then recreates a modified version of the query and runs it on the destination. DataStax Enterprise supports a restricted list of valid CQL queries to manipulate data. In DSE Advanced Replication, the same restrictions apply to the generated CQL queries that are used to replicate data into the destination.
Restrictions apply to the primary key. The primary key consists of two parts: the partition key and the clustering key. The primary key parts plus the optional field values comprise the database row.
If differences exist between the primary key on the source table and the primary key on the destination table, restrictions apply for which CQL queries are supported.
Best practices
DataStax recommends the following best practices to ensure seamless replication.
- Schema structure on the source table and the destination table
-
-
Maintain an identical primary key (partition keys and clustering keys) format in the same order, with the same columns.
-
Add the optional
source_id
as the first clustering column. -
Maintain all, or a subset of, the field values.
Although the
source_id
column can be present in the source table schema, values that are inserted into that column are ignored. When records are replicated, the configuredsource-id
value is used. -
- Partition key columns
-
The following list details support and restrictions for partition keys:
-
In the destination table, only an additional optional
source_id
column is supported in the partition key. Additional destination table partition key columns are not supported. Thesource_id
can be either a clustering column or a partition key, but not both. -
Using a subset of source table partition key columns in the destination table might result in overwriting. There is a many-to-one mapping for row entries.
-
Order is irrelevant for replication. All permutations are supported.
-
CQL
UPDATE
queries require that all of the partition key columns are fully restricted. Restrict partition key columns using = orIN
(single column) restrictions. -
CQL
DELETE
queries require that all of the partition key columns are fully restricted. Restrict partition key columns using = orIN
(single column) restrictions.
-
Clustering columns
The following list details support and restrictions for clustering columns:
-
In the destination table, only an additional optional
source_id
column is supported in the clustering column. Additional destination table partition key columns are not supported. Thesource_id
can be either a clustering column or a partition key, but not both. -
Using a subset of source table clustering columns in the destination table might result in overwriting. There is a many-to-one mapping for row entries.
-
Order is irrelevant for replication when using CQL
INSERT
andUPDATE
queries. All permutations are supported. -
Order is relevant for replication when using CQL
DELETE
queries. There are limits to permutation support, all permutations are not supported. -
CQL
UPDATE
queries require that all of the clustering columns are fully restricted. Restrict partition key columns using = orIN
(single column) restrictions. -
CQL
DELETE
queries require that the last-specified clustering column be restricted using =/>/>=/</<= (single or multiple column) orIN
(single or multiple column). All of the clustering columns that precede the last-specified clustering column must also be restricted using = orIN
. -
Restricting clustering columns is optional. However, if you do restrict clustering columns, then all of the clustering columns that you restrict between the first and last (in order) clustering columns must be restricted.
Field values
The following list details support and requirements for field values:
-
A subset, or all, of the field values on the source are supported for replication to the destination.
-
Fields that are present on the source, but absent on the destination, are not replicated.
-
Fields that are present on the destination, but absent on the source, are not populated.
Source ID (source_id
)
The source_id
identifies the source cluster and all inserts from the source cluster.
The following list details support and requirements for the source_id
:
-
The
source_id
configuration key must be present and correct in the metadata. -
The
source_id
must be the first position in the clustering column, or any of the partition keys.If not, then the CQL
INSERT
andUPDATE
queries should work, but the CQLDELETE
queries with partially restricted clustering columns might fail. -
The
source_id
is always restricted in CQLDELETE
andUPDATE
queries. Certain delete statements are not supported where the clustering key is not fully restricted, and thesource_id
is not the first clustering column