BYOH Prerequisites and installation

Configure BYOH data centers to isolate workloads.

You need to install DataStax Enterprise on all the nodes, nodes in the Hadoop cluster and additional nodes outside the Hadoop cluster. You configure the additional nodes in one or more BYOH data centers to isolate workloads. Run sequential data loads, not random OLTP loads or Solr data loads in a BYOH data center.

Prerequisites 

The prerequisites for installing and using the BYOH model are:
  • Installation of a functioning CDH or HDP Hadoop cluster.
  • Installation and configuration of these master services on the Hadoop cluster:
    • Job Tracker or Resource Manager (required)
    • HDFS Name Node (required)
    • Secondary Name Node or High Availability Name Nodes (required)
  • At least one set of HDFS Data Nodes (required externally)

    The BYOH nodes need to be able to communicate with HDFS Data Node located outside the BYOH data center.

    During the installation procedure, you install the only Hadoop components you need in the BYOH data center: Task Trackers/Node Managers and optional clients, MapReduce, Hive, and Pig. Install Hadoop on the same paths on all nodes. CLASSPATH variables used by BYOH need to work on all nodes.

Installation procedure  

To install DataStax enterprise:

  1. Ensure that you meet the prerequisites.
  2. On each node in the BYOH and Hadoop cluster, install but do not start up DataStax Enterprise. Install DataStax Enterprise as a plain Cassandra node, not to run CFS, Solr, or integrated Hadoop. If you are using the GUI installer, on Node Setup, select Cassandra Node in the Node Type drop-down.

  3. On packaged installations on the Hadoop cluster only, remove the init.d startup files for DataStax Enterprise and DataStax Enterprise Agent. For example, as root, stop DSE processes if they started up automatically, and then remove the files:
    $ sudo /etc/init.d/dse stop
    $ sudo /etc/init.d/datastax-agent stop
    $ sudo rm -rf /etc/init.dse
    $ sudo rm /etc/init.d/datastax-agent
    Removing the startup files prevents accidental start up of DataStax Enterprise on the Hadoop cluster.
  4. Deploy only the BYOH nodes in a virtual data center.
  5. After configuring the cassandra.yaml and dse.yaml files as described in instructions for deploying the data center, copy both files to the nodes in the Hadoop cluster, overwriting the original files.
  6. Observe workload isolation best practices. Do not enable vnodes.
  7. Install the following Hadoop components and services on the BYOH nodes.
    • Task Tracker or Node Manager (required)
    • MapReduce (required).
    • Clients you want to use: Hive or Pig, for example (optional)

    Including the HDFS Data Node in the BYOH data center is optional, but not recommended.

Separating workloads

Use separate data centers to deploy mixed workloads. Within the same data center, do not mix nodes that run DSE Hadoop integrated job and task trackers with external Hadoop services. In the BYOH mode, run external Hadoop services on the same nodes as Cassandra. You can enable CFS on these Cassandra nodes as a startup option, but this is not recommended.