Setting the replication factor for analytics keyspaces 

Guidelines and steps to set the replication factor for keyspaces on DSE Analytics nodes.

You must configure the replication factors appropriate for using DSE Analytics in production environments. The keyspaces that require replication factor changes for DSE Analytics are:
  • cfs
  • cfs_archive
  • dse_leases
  • dsefs
  • HiveMetaStore

Default replication factors 

The default replication factor for the HiveMetaStore, cfs, dsefs, and cfs_archive system keyspaces is 1.

However, only the initial node in a multi datacenter has a replication factor of 1 for the dse_leases keyspace. For new datacenters, the first node is created with the dse_leases keyspace with an replication factor of 1 for that datacenter. However, any datacenters that you add have a replication factor of 0 and require configuration before you start DSE Analytics nodes. You must change the replication factor of the dse_leases keyspace for multiple analytics datacenters.
Important: Every time you add a new datacenter, you must manually increase the replication factor of the dse_leases keyspace for the new DSE Analytics datacenter. If DataStax Enterprise or Spark security options are enabled on the cluster, you must also increase the replication factor for the dse_security keyspace across all logical datacenters.
  • A replication factor of 1 is suitable only for development and testing of a single node, but not for a production environment.
  • For production clusters, increase the replication factor to at least 3 for each logical datacenter that is running analytics.

The number of nodes in the cluster determines the replication factor.

Procedure

To change the replication factors of these keyspaces:

  1. Examine your keyspaces.
    For example:
    DESCRIBE KEYSPACE cfs
  2. Change the replication factor to at least 3:
    ALTER KEYSPACE cfs
      WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'Analyticsdc1' : 3};
    ALTER KEYSPACE cfs_archive
        WITH REPLICATION= {'class' : 'NetworkTopologyStrategy', 'Analyticsdc1' : 3};
  3. For proper handling of leader election for Spark Master and Hadoop Job Tracker nodes, update the dse_leases keyspace to change the replication factor to 3, for example:
    ALTER KEYSPACE dse_leases
        WITH REPLICATION= {'class' : 'NetworkTopologyStrategy', 'Analyticsdc1' : 3};
    Important: Every time you add a new datacenter, you must manually increase the replication factor of the dse_leases keyspace for the new DSE Analytics datacenter. If DataStax Enterprise or Spark security options are enabled on the cluster, you must also increase the replication factor for the dse_security keyspace across all logical datacenters.
  4. If you use DSEFS, update the dsefs keyspace to change the replication factor to 3, for example:
    ALTER KEYSPACE dsefs
        WITH REPLICATION= {'class' : 'NetworkTopologyStrategy', 'Analyticsdc1' : 3};
  5. If you use Hive, update the HiveMetaStore keyspace to change the replication factor to 3, for example:
    ALTER KEYSPACE "HiveMetaStore"
        WITH REPLICATION= {'class' : 'NetworkTopologyStrategy', 'Analyticsdc1' : 3};
  6. Run nodetool repair to avoid having missing data problems or data unavailable exceptions.