Configuring DSEFS

DSEFS does not span datacenters. Create a separate DSEFS instance in each datacenter, as described in the following steps.

You must configure data replication. You can optionally configure multiple DSEFS instances in a datacenter, and perform other functions, including setting the Kafka log retention.

DSEFS limitations

When you configure and tune DSEFS, be aware that the following features aren’t supported:

Encryption. Use operating system access controls to protect the local DSEFS data directories.
File system consistency checks (fsck) and file repair have limited support. Running fsck will re-replicate blocks that were under-replicated because a node was taken out of a cluster.
File repair.
Forced rebalancing, although the cluster will eventually reach balance.
Checksum.
Automatic backups.
Multi-datacenter replication.
Symbolic links (soft links, symlinks) and hardlinks.
Snapshots.

Procedure

Configure replication for the metadata and the data blocks.

You must set the replication factor appropriately to prevent data loss in the case of node failure. Replication factors must be set for both the metadata and the data blocks. The replication factor of 3 for data blocks is suitable for most use-cases.
1. Set global replication for the metadata in the dsefs keyspace that is stored in the database.
  
  For example, use a CQL statement to configure a replication factor of 3 on the Analytics datacenter using NetworkTopologyStrategy:
  ALTER KEYSPACE dsefs WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'Analytics': '3'};
  Datacenter names are case-sensitive. Verify the case of the using a utility such as dsetool status.
2. Run nodetool repair on the DSEFS keyspace:
  nodetool repair dsefs
3. Set the local redundancy factor on a specific DSEFS file or directory where the data blocks are stored.
  
  For example:
  dse fs mkdir -n 4 newdirectory
  When a redundancy factor is not specified, it is inherited from the parent directory. The default redundancy factor is 3.
If you have multiple Analytics datacenters, you must configure each DSEFS file system to replicate within its own datacenter:
1. In the dse.yaml file, specify a separate DSEFS keyspace for each logical datacenter.
  
  For example, on a cluster with logical datacenters DC1 and DC2, you could create keyspaces named dsefs1 and dsefs2 to define separate file systems in each datacenter. Then, set the keyspace on each node in DC1:
  dsefs_options: ... keyspace_name: dsefs1
  And then set the keyspace on each node in DC2:
  dsefs_options: ... keyspace_name: dsefs2
2. Restart the nodes.
3. Alter the keyspace replication to exist only on the specific datacenters.
  
  Continuing the example with DC1 and DC2, alter DC1’s keyspace to only use DC1:
  ALTER KEYSPACE dsefs1 WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'DC1': '3'};
  Then, alter DC2’s keyspace to only use DC2:
  ALTER KEYSPACE dsefs2 WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'DC2': '3'};
4. Run nodetool repair on the DSEFS keyspace:
  nodetool repair dsefs
When bouncing a streaming application, verify the Kafka log configuration (especially log.retention.check.interval.ms and policies.log.retention.bytes). Ensure the Kafka log retention policy is robust enough to handle the length of time expected to bring the application and consumers back up.

For example, if the log retention policy is too conservative and deletes or rolls are logged very frequently to save disk space, the users are likely to encounter issues when attempting to recover from a checkpoint that references offsets that are no longer maintained by the Kafka logs.

Configuring DSEFS

DSEFS limitations

Procedure

Was this helpful?

Give Feedback