Configuring DSEFS
Steps to configure DSEFS, configure data replication, and other functions, including setting the Kafka log retention.
DSEFS does not span datacenters. Create a separate DSEFS instance in each datacenter, as described in the steps below.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
DSEFS limitations
Know these limitations when you configure and tune DSEFS. The following functionality and features are not supported:- Encryption.
Use operating system access controls to protect the local DSEFS data directories.
- File system consistency checks (
fsck
) and file repair have only limited support. Runningfsck
will re-replicate blocks that were under-replicated because a node was taken out of a cluster. - File repair.
- Forced rebalancing, although the cluster will eventually reach balance.
- Checksum.
- Automatic backups.
- Multi-datacenter replication.
- Symbolic links (soft links, symlinks) and hardlinks.
- Snapshots.
Procedure
-
Configure replication for the metadata and the data blocks.
You must set the replication factor appropriately to prevent data loss in the case of node failure. Replication factors must be set for both the metadata and the data blocks. The replication factor of 3 for data blocks is suitable for most use-cases.
-
If you have multiple Analytics datacenters, you must configure each
DSEFS file system to replicate within its own datacenter:
For example, in a cluster with multiple datacenters, the keyspace names
dsefs1
anddsefs2
define separate file systems in each datacenter. -
When bouncing a streaming application, verify the Kafka
log configuration (especially
log.retention.check.interval.ms
andpolicies.log.retention.bytes
). Ensure the Kafka log retention policy is robust enough to handle the length of time expected to bring the application and consumers back up.For example, if the log retention policy is too conservative and deletes or rolls are logged very frequently to save disk space, the users are likely to encounter issues when attempting to recover from a checkpoint that references offsets that are no longer maintained by the Kafka logs.