Analytics node configuration
Steps to configure analytic Hadoop nodes.
Important configuration changes, excluding those related to the job tracker, are:
- Disabling virtual nodes
- Setting the replication factor
- Configuring the verbosity of log messages
- Connecting to non-standard Cassandra native port
Advanced users can also configure DataStax Enterprise to run jobs remotely.
DataStax Enterprise turns off virtual nodes (vnodes) by default. DataStax does not recommend turning on vnodes for Hadoop or Solr nodes, but you can use vnodes for any Cassandra-only cluster, or a Cassandra-only data center in a mixed Hadoop/Solr/Cassandra deployment. If you have enabled virtual nodes on Hadoop nodes, disable virtual nodes before using the cluster.
Setting the replication factor
The default replication for the HiveMetaStore, cfs, and cfs_archive system keyspaces is 1. A replication factor of 1 using the default data center Analytics is configured for development and testing of a single node, not for a production environment. For production clusters, increase the replication factor to at least 2. The higher replication factor ensures resilience to single-node failures. For example:
ALTER KEYSPACE cfs WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
Configuring the verbosity of log messages
To adjust the verbosity of log messages for Hadoop map/reduce tasks, add the following settings to the log4j.properties file on each analytic node:
log4j.logger.org.apache.hadoop.mapred=WARN log4j.logger.org.apache.hadoop.filecache=WARN
Default Hadoop log4j-server.properties locations:
- Installer-Services and Package installations: /etc/dse/hadoop/
- Installer-No Services and Tarball installations: install_location/resources/hadoop/conf/
Connecting to non-standard Cassandra native port
If the Cassandra native port was changed to a port other than the default port 9042, you must change the cassandra.input.native.port configuration setting for Hive and Hadoop to use the non-default port. The following examples change the Cassandra native port protocol connections to use port 9999.- Inside the Hive shell, set the port after starting DSE
Hive:
$ dse hive hive> set cassandra.input.native.port=9999;
- General Hive, add cassandra.input.native.port to the
hive-site.xml file:
<property> <name>cassandra.input.native.port</name> <value>9999</value> </property>
- For Hadoop, add cassandra.input.native.port to the
core-site.xml file:
<property> <name>cassandra.input.native.port</name> <value>9999</value> </property>
Configuration for running jobs on a remote cluster
This information is intended for advanced users.
Procedure
To connect to external addresses: