Analytics node configuration
Steps to configure analytic Hadoop nodes.
Important configuration changes, excluding those related to the job tracker, are:
Advanced users can also configure DataStax Enterprise to run jobs remotely.
Disable virtual nodes
DataStax recommends using virtual nodes only on data centers running Cassandra real-time workloads. You should disable virtual nodes on data centers running either Hadoop or Solr workloads.
To disable virtual nodes:
- In the cassandra.yaml file, set num_tokens to
1.
num_tokens = 1
- Uncomment the initial_token property and set it to 1 or to the value of a generated token for a multi-node cluster.
Setting the replication factor
The default replication for the HiveMetaStore, cfs, and cfs_archive system keyspaces is 1. A replication factor of 1 using the default data center Analytics is configured for development and testing of a single node, not for a production environment. For production clusters, increase the replication factor to at least 2. The higher replication factor ensures resilience to single-node failures. For example:
ALTER KEYSPACE cfs WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
Configuring the verbosity of log messages
To adjust the verbosity of log messages for Hadoop map/reduce tasks, add the following settings to the log4j.properties file on each analytic node:log4j.logger.org.apache.hadoop.mapred=WARN log4j.logger.org.apache.hadoop.filecache=WARN
- Installer-Services and Package installations: /etc/dse/cassandra/
- Installer-No Services and Tarball installations: install_location/resources/cassandra/conf/
Configuration for running jobs on a remote cluster
This information is intended for advanced users.
Procedure
To connect to external addresses: