Setting the replication factor for analytics keyspaces

Guidelines and steps to set the replication factor for keyspaces on DSE Analytics nodes.

Keyspaces and tables are automatically created when DSE Analytics nodes are started for the first time. The replication factor must be adjusted for these keyspaces in order for the analytics features to work properly and to avoid data loss.

The keyspaces used by DSE Analytics are the following:

  • cfs
  • cfs_archive
  • dse_leases
  • dsefs
  • "HiveMetaStore"
  • spark_system

All analytics keyspaces are initially created with the SimpleStrategy replication strategy and a replication factor (RF) of 1. Each of these must be updated in production environments to avoid data loss. After starting the cluster, alter the keyspace to use the NetworkTopologyStrategy replication strategy with an appropriate settings for the replication factor and datacenters. For most environments using DSE Analytics, a suitable replication factor will be either 3 or the cluster size, whichever is smaller.

For example, use a CQL statement to configure the dse_leases keyspace for a replication factor of 3 in both DC1 and DC2 datacenters using NetworkTopologyStrategy:

ALTER KEYSPACE dse_leases
WITH REPLICATION = {
   'class': 'NetworkTopologyStrategy', 
   'DC1': '3',
   'DC2': '3'
   };

The datacenter name used is case-sensitive. If needed, use the dsetool status command to confirm the exact datacenter spelling.

After adjusting the replication factor, nodetool repair must be run on each node in the affected datacenters. For example to repair the altered keyspace dse_leases:

nodetool repair -full dse_leases

Repeat the above steps for each of the analytics keyspaces listed above. For more information see Changing keyspace replication strategy.