Configuring DSEFS

You must configure data replication. You can optionally configure multiple DSEFS file systems in a datacenter, and perform other functions, including setting the Kafka log retention.

DSEFS does not span datacenters. Create a separate DSEFS instance in each datacenter, as described in the steps below.

DSEFS limitations

Know these limitations when you configure and tune DSEFS. The following functionality and features are not supported:

Encryption.

Use operating system access controls to protect the local DSEFS data directories.
File system consistency checks (fsck) and file repair have only limited support. Running fsck re-replicates blocks that were under-replicated because a node was taken out of a cluster.
File repair.
Forced rebalancing, although the cluster eventually reaches balance.
Checksum.
Automatic backups.
Multi-datacenter replication.
Symbolic links (soft links, symlinks) and hardlinks.
Snapshots.

Procedure

Configure replication for the metadata and the data blocks.

DSEFS keyspace creation uses SimpleStrategy with replication factor of 1. After starting the cluster for the first time, you must alter the keyspace to use NetworkTopologyStrategy with proper RF.

You must set the replication factor appropriately to prevent data loss in the case of node failure. Replication factors must be set for both the metadata and the data blocks. The replication factor of 3 for data blocks is suitable for most use-cases.

Globally: set replication for the metadata in the dsefs keyspace that is stored in the database.

For example, use a CQL statement to configure a replication factor of 3 on the Analytics datacenter using NetworkTopologyStrategy:
```
ALTER KEYSPACE dsefs
WITH REPLICATION = {
   'class': 'NetworkTopologyStrategy',
   'Analytics': '3'};
```
Datacenter names are case sensitive. Verify the case of the using utility, using a command like dsetool status.
Run nodetool repair on the DSEFS keyspace.
```
$ nodetool repair dsefs
```

Locally: set the replication factor on a specific DSEFS file or directory where the data blocks are stored.

For example, use the command line:

$ dse fs mkdir -n 4 newdirectory

When a replication factor (RF) is not specified, the RF is inherited from the parent directory.

Where is the dse.yaml file?

The location of the dse.yaml file depends on the type of installation:

Installation Type Location

Installation Type	Location
Package installations + Installer-Services installations	`/etc/dse/dse.yaml`
Tarball installations + Installer-No Services installations	`<installation_location>/resources/dse/conf/dse.yaml`

Package installations + Installer-Services installations

/etc/dse/dse.yaml

Tarball installations + Installer-No Services installations

<installation_location>/resources/dse/conf/dse.yaml

If you have multiple Analytics datacenters, you must configure each DSEFS file system to replicate within its own datacenter:
1. In the dse.yaml file, specify a separate DSEFS keyspace for each logical datacenter.
  
  For example, on a cluster with logical datacenters DC1 and DC2.
  
  On each node in DC1:
  dsefs_options: ... keyspace_name: dsefs1
  On each node in DC2:
  dsefs_options: ... keyspace_name: dsefs2
2. Restart the nodes.
3. Alter the keyspace replication to exist only on the specific datacenters.
  
  On DC1:
  ALTER KEYSPACE dsefs1 WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'DC1': '3'};
  On DC2:
  ALTER KEYSPACE dsefs2 WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'DC2': '3'};
4. Run nodetool repair on the DSEFS keyspace.
  $ nodetool repair dsefs
For example, in a cluster with multiple datacenters, the keyspace names dsefs1 and dsefs2 define separate file systems in each datacenter.
When bouncing a streaming application, verify the Kafka log configuration (especially log.retention.check.interval.ms and policies.log.retention.bytes). Ensure the Kafka log retention policy is robust enough to handle the length of time expected to bring the application and consumers back up.

For example, if the log retention policy is too conservative and deletes or rolls are logged very frequently to save disk space, the users are likely to encounter issues when attempting to recover from a checkpoint that references offsets that are no longer maintained by the Kafka logs.

Configuring DSEFS

DSEFS limitations

Procedure

Was this helpful?

Give Feedback