Setting the replication factor

Guidelines and steps to set the replication factor on DSE Analytics nodes.

The Cassandra File System (CFS) is a Hadoop Distributed File System (HDFS)-compatible storage layer. DataStax Enterprise replaces HDFS with CFS to run MapReduce jobs on Cassandra's peer-to-peer, fault-tolerant, and scalable architecture. CFS is a fundamental piece of infrastructure for all DSE Analytics nodes. For CFS, the three keyspaces are:
  • cfs
  • cfs_archive
  • HiveMetaStore
The default replication factor for the HiveMetaStore, cfs, and cfs_archive system keyspaces is 1.
  • A replication factor of 1 using the default data center Analytics is suitable only for development and testing of a single node, but not for a production environment.
  • For production clusters, increase the replication factor to at least 3.

The number of nodes in the cluster determines the replication factor, as discussed in Choosing keyspace replication options. To change the replication factors of these keyspaces:

Procedure

  1. Change the replication factor of the cfs and cfs_archive keyspaces from 1 to 3, for example:
    ALTER KEYSPACE cfs
      WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
    ALTER KEYSPACE cfs_archive
        WITH REPLICATION= {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
  2. If you use Hive, update the HiveMetaStore keyspace to increase the replication from 1 to 3, for example:
    ALTER KEYSPACE "HiveMetaStore"
        WITH REPLICATION= {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
  3. Run nodetool repair to avoid having missing data problems or data unavailable exceptions.

What's next

Ensure that you appropriately configure replication for your environment.