Configuring an external Hadoop system

Perform configuration tasks after you install DataStax Enterprise.

You perform a few configuration tasks after installation of DataStax Enterprise.
  • Configure Kerberos on the Hadoop cluster.
  • Configure Java on the Hadoop cluster.
  • Install Hive 0.12 on the Hadoop cluster.
  • Configure BYOH environment variables on nodes in the BYOH data center.

Configuring Kerberos (optional) 

To use Kerberos to protect your data, configure Hadoop security under Kerberos on your Hadoop cluster. For information about configuring Hadoop security, see "Using Cloudera Manager to Configure Hadoop Security" or the Hortonworks documentation.

Configuring Java 

BYOH requires the external Hadoop system to use Java 7. Install Java 7 and ensure that the Cloudera and Hortonworks clusters are configured to use it.

Configuring Hive 

You configure nodes to use a particular distribution of Hive or Pig, generally the one provided with Cloudera or Hortonworks. Exceptions:
  • Cloudera uses a fork of Hive.
  • Old versions of Hortonworks use Hive 0.11.

BYOH requries Apache Hive 0.12, so you need to install Hive 0.12 and configure BYOH to use this installation.

  1. Download Hive 0.12 from http://apache.mirrors.pair.com/hive/hive-0.12.0/hive-0.12.0.tar.gz.
  2. Unpack the archive to install Hive.
    $ tar -xzvf hive-0.12.0.tar.gz
  3. If you move the Hive installation, avoid writing over the earlier version installed by Cloudera Manager or Ambari. For example, rename the Hive fork if necessary.
  4. Move the Hive you installed to the following location:
    $ sudo mv hive-0.12.0 /usr/lib/hive12
After making the changes, restart the external Hadoop system. For example, restart the CDH cluster from the Cloudera Manager-Cloudera Management Service drop-down. Finally, configure BYOH environment variables before using DataStax Enterprise.

Configuring BYOH environment variables 

The DataStax Enterprise installation includes a configuration file that sets up the DataStax Enterprise environment. Make these changes on all nodes in the BYOH data center.

  1. Open the byoh-env.sh file in the following directory:
    • Installer-Services and Package installations: /etc/dse/byoh-env.sh
    • Installer-No Services and Tarball installations: install_location/bin/byoh-env.sh
  2. Set the DSE_HOME environment variable to the DataStax Enterprise installation directory.
    • Package installations: :
      export DSE_HOME="/etc/dse"
    • Installer-Services installations:
      export DSE_HOME="/usr/share/dse"
    • Installer-No Services and Tarball installations:
      export DSE_HOME="install_location"
  3. Open byoh-env.sh and edit the file to point the BYOH configuration variable to the new hive.
    HIVE_HOME="/usr/lib/hive12"
  4. Check that other configurable variables match the location of components in your environment.
  5. Configure the byoh-env.sh for using Pig by editing the IP addresses to reflect your environment. On a single node, cluster for example:
    export PIG_INITIAL_ADDRESS=127.0.0.1
    export PIG_OUTPUT_INITIAL_ADDRESS=127.0.0.1
    export PIG_INPUT_INITIAL_ADDRESS=127.0.0.1
  6. If a Hadoop data node is not running on the local machine, configure the DATA_NODE_LIST and NAME_NODE variables as follows:
    • DATA_NODE_LIST

      Provide a comma-separated list of Hadoop data node IP addresses this machine can access. The list is set to mapreduce.job.hdfs-servers in the client configuration.

    • NAME_NODE
      Provide the name or IP address of the name node. For example:
      export DATA_NODE_LIST="192.168.1.1, 192.168.1.2, 192.168.1.3"
      export NAME_NODE="localhost"
    If a Hadoop data node is running on the local machine, leave these variables blank. For example:
    export DATA_NODE_LIST=
    export NAME_NODE=