Configuring DSEFS
You must configure data replication. You can optionally configure multiple DSEFS file systems in a datacenter, and perform other functions, including setting the Kafka log retention.
DSEFS does not span datacenters. Create a separate DSEFS instance in each datacenter, as described in the steps below.
DSEFS limitations
Know these limitations when you configure and tune DSEFS. The following functionality and features are not supported:
-
Encryption.
Use operating system access controls to protect the local DSEFS data directories.
-
File system consistency checks (
fsck
) and file repair have only limited support. Runningfsck
re-replicates blocks that were under-replicated because a node was taken out of a cluster. -
File repair.
-
Forced rebalancing, although the cluster eventually reaches balance.
-
Checksum.
-
Automatic backups.
-
Multi-datacenter replication.
-
Symbolic links (soft links, symlinks) and hardlinks.
-
Snapshots.
Procedure
-
Configure replication for the metadata and the data blocks.
DSEFS keyspace creation uses
SimpleStrategy
with replication factor of 1. After starting the cluster for the first time, you must alter the keyspace to useNetworkTopologyStrategy
with proper RF.You must set the replication factor appropriately to prevent data loss in the case of node failure. Replication factors must be set for both the metadata and the data blocks. The replication factor of 3 for data blocks is suitable for most use-cases.
-
Globally: set replication for the metadata in the
dsefs
keyspace that is stored in the database.For example, use a CQL statement to configure a replication factor of 3 on the Analytics datacenter using
NetworkTopologyStrategy
:ALTER KEYSPACE dsefs WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'Analytics': '3'};
Datacenter names are case sensitive. Verify the case of the using utility, using a command like
dsetool status
. -
Run nodetool repair on the DSEFS keyspace.
$ nodetool repair dsefs
-
Locally: set the replication factor on a specific DSEFS file or directory where the data blocks are stored.
For example, use the command line:
$ dse fs mkdir -n 4 newdirectory
When a replication factor (RF) is not specified, the RF is inherited from the parent directory.
Where is the
dse.yaml
file?The location of the
dse.yaml
file depends on the type of installation:Installation Type Location Package installations + Installer-Services installations
/etc/dse/dse.yaml
Tarball installations + Installer-No Services installations
<installation_location>/resources/dse/conf/dse.yaml
-
-
If you have multiple Analytics datacenters, you must configure each DSEFS file system to replicate within its own datacenter:
-
In the
dse.yaml
file, specify a separate DSEFS keyspace for each logical datacenter.For example, on a cluster with logical datacenters DC1 and DC2.
On each node in DC1:
dsefs_options: ... keyspace_name: dsefs1
On each node in DC2:
dsefs_options: ... keyspace_name: dsefs2
-
Restart the nodes.
-
Alter the keyspace replication to exist only on the specific datacenters.
On DC1:
ALTER KEYSPACE dsefs1 WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'DC1': '3'};
On DC2:
ALTER KEYSPACE dsefs2 WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'DC2': '3'};
-
Run nodetool repair on the DSEFS keyspace.
$ nodetool repair dsefs
For example, in a cluster with multiple datacenters, the keyspace names
dsefs1
anddsefs2
define separate file systems in each datacenter. -
-
When bouncing a streaming application, verify the Kafka log configuration (especially
log.retention.check.interval.ms
andpolicies.log.retention.bytes
). Ensure the Kafka log retention policy is robust enough to handle the length of time expected to bring the application and consumers back up.For example, if the log retention policy is too conservative and deletes or rolls are logged very frequently to save disk space, the users are likely to encounter issues when attempting to recover from a checkpoint that references offsets that are no longer maintained by the Kafka logs.