BYOH Prerequisites and installation

Configure BYOH datacenters to isolate workloads.

You must install DataStax Enterprise on all the nodes, nodes in the Hadoop cluster, and additional nodes outside the Hadoop cluster. Configure the additional nodes in one or more BYOH datacenters to isolate workloads. Run sequential data loads, not random OLTP loads or Solr data loads in a BYOH datacenter.

Use separate datacenters to deploy mixed workloads. Within the same datacenter, do not mix nodes that run DSE Hadoop integrated Job Tracker and Task Trackers with external Hadoop services. In the BYOH mode, run external Hadoop services on the same nodes as Cassandra. Although you can enable CFS on these Cassandra nodes as a startup option, CFS as a primary data store is not recommended.

Prerequisites

The prerequisites for installing and using the BYOH model are:
  • Installation of a functioning CDH or HDP Hadoop cluster.
  • Installation and configuration of these master services on the Hadoop cluster:
    • Job Tracker or Resource Manager (required)
    • HDFS Name Node (required)
    • Secondary Name Node or High Availability Name Nodes (required)
  • At least one set of HDFS Data Nodes (required externally)

    The BYOH nodes must to be able to communicate with the HDFS Data Node that is located outside the BYOH data center.

    During the installation procedure, you install only the required Hadoop components in the BYOH datacenter: Task Trackers/Node Managers and optional clients, MapReduce, Hive, and Pig. Install Hadoop on the same paths on all nodes. CLASSPATH variables that are used by BYOH need to work on all nodes.

Procedure

  1. On each node in the BYOH and Hadoop cluster, install but do not start up DataStax Enterprise. Install DataStax Enterprise as a plain Cassandra node, not to run CFS, Solr, or integrated Hadoop. If you are using the GUI installer, on Node Setup, select Cassandra Node for Node Type.

  2. On packaged installations on the Hadoop cluster only, remove the init.d startup files for DataStax Enterprise.

    On package installs, stop DataStax Enterprise processes if they started up automatically, and then remove the files:

    sudo /etc/init.d/dse stop
    $ sudo rm -rf /etc/init.dse

    Removing the startup files prevents accidental start up of DataStax Enterprise on the Hadoop cluster.

  3. Deploy only the BYOH nodes in a virtual datacenter.
  4. Export the DataStax Enterprise client configuration to the nodes in the Hadoop cluster.
    1. On DataStax Enterprise nodes:
      dse client-tool configuration export dse-config.jar
    2. On the Hadoop nodes:
      dse client-tool configuration import dse-config.jar
  5. Observe workload isolation best practices. Do not enable vnodes.
  6. Install the following Hadoop components and services on the BYOH nodes.
    • Task Tracker or Node Manager (required)
    • MapReduce (required).
    • Clients you want to use: Hive or Pig, for example (optional)

    Including the HDFS Data Node in the BYOH datacenter is optional, but not recommended.