Configuring an external Hadoop system
Perform configuration tasks after you install DataStax Enterprise.
- Configure Kerberos on the Hadoop cluster.
- Configure Java on the Hadoop cluster.
- Install Hive 0.12 on the Hadoop cluster.
- Configure BYOH environment variables on nodes in the BYOH data center.
Configuring Kerberos (optional)
To use Kerberos to protect your data, configure Hadoop security under Kerberos on your Hadoop cluster. For information about configuring Hadoop security, see "Using Cloudera Manager to Configure Hadoop Security" or the Hortonworks documentation.
Configuring Java
BYOH requires the external Hadoop system to use Java 7. Install Java 7 and ensure that the Cloudera and Hortonworks clusters are configured to use it.
Configuring Hive
- Cloudera uses a fork of Hive.
- Old versions of Hortonworks use Hive 0.11.
BYOH requries Apache Hive 0.12, so you need to install Hive 0.12 and configure BYOH to use this installation.
- Download Hive 0.12 from http://apache.mirrors.pair.com/hive/hive-0.12.0/hive-0.12.0.tar.gz.
- Unpack the archive to install
Hive.
$ tar -xzvf hive-0.12.0.tar.gz
- If you move the Hive installation, avoid writing over the earlier version installed by Cloudera Manager or Ambari. For example, rename the Hive fork if necessary.
- Move the Hive you installed to the following
location:
$ sudo mv hive-0.12.0 /usr/lib/hive12
Configuring BYOH environment variables
The DataStax Enterprise installation includes a configuration file that sets up the DataStax Enterprise environment. Make these changes on all nodes in the BYOH data center.
- Open the byoh-env.sh file in the following directory:
- Installer-Services and Package installations: /etc/dse/byoh-env.sh
- Installer-No Services and Tarball installations: install_location/bin/byoh-env.sh
- Set the DSE_HOME environment variable to the DataStax Enterprise installation directory.
- Package installations: :
export DSE_HOME="/etc/dse"
- Installer-Services installations:
export DSE_HOME="/usr/share/dse"
- Installer-No Services and Tarball installations:
export DSE_HOME="install_location"
- Package installations: :
- Open byoh-env.sh and
edit the file to point the BYOH configuration variable to the new
hive.
HIVE_HOME="/usr/lib/hive12"
- Check that other configurable variables match the location of components in your environment.
- Configure the byoh-env.sh for using Pig by editing the IP addresses to reflect your
environment. On a single node, cluster for
example:
export PIG_INITIAL_ADDRESS=127.0.0.1 export PIG_OUTPUT_INITIAL_ADDRESS=127.0.0.1 export PIG_INPUT_INITIAL_ADDRESS=127.0.0.1
- If a Hadoop data node is not running on the local machine, configure the DATA_NODE_LIST
and NAME_NODE variables as follows:
- DATA_NODE_LIST
Provide a comma-separated list of Hadoop data node IP addresses this machine can access. The list is set to mapreduce.job.hdfs-servers in the client configuration.
- NAME_NODEProvide the name or IP address of the name node. For example:
export DATA_NODE_LIST="192.168.1.1, 192.168.1.2, 192.168.1.3" export NAME_NODE="localhost"
export DATA_NODE_LIST= export NAME_NODE=
- DATA_NODE_LIST