Steps to use DSEFS, configure data replication, and other functions, including setting
the Kafka log retention.
You must configure data replication. You can optionally configure multiple DSEFS file systems in a datacenter, and perform other functions,
including setting the Kafka log retention.The location of
the
dse.yaml file depends
on the type of installation:
Installer-Services |
/etc/dse/dse.yaml |
Package installations |
/etc/dse/dse.yaml |
Installer-No Services |
install_location/resources/dse/conf/dse.yaml |
Tarball installations |
install_location/resources/dse/conf/dse.yaml |
DSEFS limitations
Know these limitations when you
configure and tune DSEFS. The following functionality and features are not supported:
Procedure
- Required:
Configure replication for the metadata and the data blocks.
You must set the replication factor appropriately to prevent data loss in the case of node
failure. Replication factors must be set for both the metadata and the data blocks.
-
Globally: set replication for the metadata in the
dsefs
keyspace that is
stored in the Cassandra database.
For example, use a CQL statement to configure a replication factor of 3 on the
Analytics datacenter using
NetworkTopologyStrategy
:
ALTER KEYSPACE dsefs WITH replication = {'class': 'NetworkTopologyStrategy', 'Analytics': '3'};
-
Locally: set replication per DSEFS file or directory where the data blocks are stored.
For example, use the command line:
Installer-Services and Package installations: sudo dse cassandra-stop
$ sudo dse cassandra options
Installer-No Services and Tarball installations: install_location/bin/dse cassandra-stop
$ install_location/bin/dse cassandra options
When
a replication factor (RF) is not specified, the RF is inherited from the parent directory.
- Optional:
Configure multiple DSEFS file systems within a single datacenter:
-
In the file, specify a separate DSEFS keyspace
for each logical datacenter.
For example, on a cluster with logical datacenters DC1 and DC2.
On each node in
DC1:
dsefs_options:
...
keyspace_name: dsefs1
On each node in
DC2:
dsefs_options:
...
keyspace_name: dsefs2
-
Restart the nodes.
-
Alter the keyspace replication to exist only on the specific datacenters.
On
DC1:
ALTER KEYSPACE dsefs1 WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3'};
On
DC2:
ALTER KEYSPACE dsefs2 WITH replication = {'class': 'NetworkTopologyStrategy', 'DC2': '3'};
For example, in a cluster with multiple datacenters, the keyspace names dsefs1 and
dsefs2 define separate file systems in each datacenter.
-
When bouncing a streaming application, verify the Kafka
log configuration (especially
log.retention.check.interval.ms
and
policies.log.retention.bytes
). Ensure the Kafka log retention policy is
robust enough to handle the length of time expected to bring the application and consumers back
up.
For example, if the log retention policy is too conservative and deletes or rolls are
logged very frequently to save disk space, the users are likely to encounter issues when
attempting to recover from a checkpoint that references offsets that are no longer maintained
by the Kafka logs.