Setting the replication factor for analytics keyspaces

Keyspaces and tables are automatically created when DSE Analytics nodes are started for the first time. The replication factor must be adjusted for these keyspaces in order for the analytics features to work properly and to avoid data loss.

The keyspaces used by DSE Analytics are the following:

  • cfs

  • cfs_archive

  • dse_leases

  • dsefs

  • "HiveMetaStore"

  • spark_system

All analytics keyspaces are initially created with the SimpleStrategy replication strategy and a replication factor (RF) of 1. Each of these must be updated in production environments to avoid data loss. After starting the cluster, alter the keyspace to use the NetworkTopologyStrategy replication strategy with an appropriate settings for the replication factor and datacenters. For most environments using DSE Analytics, a suitable replication factor is either 3 or the cluster size, whichever is smaller.

For example, use a CQL statement to configure the dse_leases keyspace for a replication factor of 3 in both DC1 and DC2 datacenters using NetworkTopologyStrategy:

ALTER KEYSPACE dse_leases
WITH REPLICATION = {
   'class': 'NetworkTopologyStrategy',
   'DC1': '3',
   'DC2': '3'
   };

Only replicate DSE Analytics keyspaces to other DSE Analytics datacenters. DSEFS does not support replication to other datacenters, and the dsefs keyspace only contains metadata, not the data stored in DSEFS. Each DSE Analytics datacenter should have its own DSEFS instance.

The datacenter name used is case-sensitive. If needed, use the dsetool status command to confirm the exact datacenter spelling.

After adjusting the replication factor, nodetool repair must be run on each node in the affected datacenters. For example to repair the altered keyspace dse_leases:

$ nodetool repair -full dse_leases

Repeat the above steps for each of the analytics keyspaces listed above. For more information see Changing keyspace replication strategy.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com