BYOH Prerequisites and installation

Configure BYOH datacenters to isolate workloads.

You must install DataStax Enterprise on all the nodes, nodes in the Hadoop cluster, and additional nodes outside the Hadoop cluster. Configure the additional nodes in one or more BYOH datacenters to isolate workloads. Run sequential data loads, not random OLTP loads or Solr data loads in a BYOH datacenter.

Prerequisites 

The prerequisites for installing and using the BYOH model are:
  • Installation of a functioning CDH or HDP Hadoop cluster.
  • Installation and configuration of these master services on the Hadoop cluster:
    • Job Tracker or Resource Manager (required)
    • HDFS Name Node (required)
    • Secondary Name Node or High Availability Name Nodes (required)
  • At least one set of HDFS Data Nodes (required externally)

    The BYOH nodes must to be able to communicate with the HDFS Data Node that is located outside the BYOH data center.

    During the installation procedure, you install only the required Hadoop components in the BYOH datacenter: Task Trackers/Node Managers and optional clients, MapReduce, Hive, and Pig. Install Hadoop on the same paths on all nodes. CLASSPATH variables that are used by BYOH need to work on all nodes.

Installation procedure  

To install DataStax Enterprise:

  1. Ensure that you meet the prerequisites.
  2. On each node in the BYOH and Hadoop cluster, install but do not start up DataStax Enterprise. Install DataStax Enterprise as a plain Cassandra node, not to run CFS, Solr, or integrated Hadoop. If you are using the GUI installer, on Node Setup, select Cassandra Node for Node Type.

  3. On packaged installations on the Hadoop cluster only, remove the init.d startup files for DataStax Enterprise and DataStax Enterprise Agent. For example, as root, stop DSE processes if they started up automatically, and then remove the files:
    $ sudo /etc/init.d/dse stop
    $ sudo /etc/init.d/datastax-agent stop
    $ sudo rm -rf /etc/init.dse
    $ sudo rm /etc/init.d/datastax-agent
    Removing the startup files prevents accidental start up of DataStax Enterprise on the Hadoop cluster.
  4. Deploy only the BYOH nodes in a virtual datacenter.
  5. After configuring the cassandra.yaml and dse.yaml files as described in instructions for deploying the datacenter, copy both files to the nodes in the Hadoop cluster, overwriting the original files.
  6. Observe workload isolation best practices. Do not enable vnodes.
  7. Install the following Hadoop components and services on the BYOH nodes.
    • Task Tracker or Node Manager (required)
    • MapReduce (required).
    • Clients you want to use: Hive or Pig, for example (optional)

    Including the HDFS Data Node in the BYOH datacenter is optional, but not recommended.

Separating workloads

Use separate datacenters to deploy mixed workloads. Within the same datacenter, do not mix nodes that run DSE Hadoop integrated Job Tracker and Task Trackers with external Hadoop services. In the BYOH mode, run external Hadoop services on the same nodes as Cassandra. Although you can enable CFS on these Cassandra nodes as a startup option, CFS as a primary data store is not recommended.
The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml
The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml