Setting the replication factor for analytics keyspaces
Keyspaces and tables are automatically created when DSE Analytics nodes are started for the first time. The replication factor must be adjusted for these keyspaces in order for the analytics features to work properly and to avoid data loss.
The keyspaces used by DSE Analytics are the following:
-
cfs
-
cfs_archive
-
dse_leases
-
dsefs
-
"HiveMetaStore"
-
spark_system
All analytics keyspaces are initially created with the SimpleStrategy
replication strategy and a replication factor (RF) of 1.
Each of these must be updated in production environments to avoid data loss.
After starting the cluster, alter the keyspace to use the NetworkTopologyStrategy
replication strategy with an appropriate settings for the replication factor and datacenters.
For most environments using DSE Analytics, a suitable replication factor is either 3 or the cluster size, whichever is smaller.
For example, use a CQL statement to configure the dse_leases
keyspace for a replication factor of 3 in both DC1 and DC2 datacenters using NetworkTopologyStrategy
:
ALTER KEYSPACE dse_leases
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'DC1': '3',
'DC2': '3'
};
Only replicate DSE Analytics keyspaces to other DSE Analytics datacenters.
DSEFS does not support replication to other datacenters, and the |
The datacenter name used is case-sensitive.
If needed, use the dsetool status
command to confirm the exact datacenter spelling.
After adjusting the replication factor, nodetool repair
must be run on each node in the affected datacenters.
For example to repair the altered keyspace dse_leases
:
$ nodetool repair -full dse_leases
Repeat the above steps for each of the analytics keyspaces listed above. For more information see Changing keyspace replication strategy.