Installing DataStax Enterprise 5.1 using the binary tarball

Instructions for installing DataStax Enterprise 5.1 on any supported Linux-based platform.

Use these instructions for installing DataStax Enterprise (DSE) 5.1 on Linux-based platforms using a binary tarball.

Some things to know about installing DSE

  • The latest version of DataStax Enterprise 5.1 is 5.1.
  • When installed from the binary tarball, DataStax Enterprise runs as a stand-alone process.
  • This procedure installs DataStax Enterprise 5.1 and the developer related tools: Javadoc, DataStax Enterprise demos, DataStax Studio, and the DSE Graph Loader.

    It does not install OpsCenter, DataStax Agent, Studio, or Graph Loader.

spark-env.sh

The default location of the spark-env.sh file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/spark/spark-env.sh

Tarball installations
Installer-No Services installations

installation_location/resources/spark/conf/spark-env.sh

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations
Installer-No Services installations

installation_location/resources/cassandra/conf/cassandra.yaml

dse.yaml

The location of the dse.yaml file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/dse.yaml

Tarball installations
Installer-No Services installations

installation_location/resources/dse/conf/dse.yaml

Prerequisites

Hardware requirements

See Recommended production settings.

Procedure

Important: End User License Agreement (EULA). By downloading this DataStax product, you agree to the terms of the EULA.

In a terminal window:

  1. Verify that a required version of Java is installed:
    java -version
    Note: DataStax recommends the latest build of a Technology Compatibility Kit (TCK) Certified OpenJDK version 8.

    If OpenJDK, the results should look like:

    openjdk version "1.8.0_171"
    OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
    OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

    If Oracle Java, the results should look like:

    java version "1.8.0_181"
    Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

    If not OpenJDK 8 or Oracle Java 8, see Installing the JDK.

  2. When installing from the binary tarball, you can either download the tarball and then extract the files, or use curl.
    • Download and extract the tarball:
      Note: The latest version is 5.1. To view the available versions, see the Release notes.
      1. Download the tarball from Download DataStax Enterprise.
      2. Extract the files:
        tar -xzvf dse-version_number-bin.tar.gz

        For example:

        tar -xzvf dse-5.1-bin.tar.gz
    • Use curl to install the selected version:
      CAUTION: If you choose this method, your password is retained in the shell history. To avoid this security issue, DataStax recommends using curl with the --netrc or --netrc-file
      Download and extract the tarball using curl:
      curl -L https://downloads.datastax.com/enterprise/dse-version_number-bin.tar.gz | tar xz
      For example:
      curl -L https://downloads.datastax.com/enterprise/dse-5.1-bin.tar.gz | tar xz

    The files are downloaded and extracted into the 5.1 directory.

  3. You can use either the default data and logging directory locations or define your locations:
    • To use the default data and logging directory locations, create and change ownership for the following:
      sudo mkdir -p /var/lib/cassandra; sudo chown -R  $USER:$GROUP /var/lib/cassandra &&
      sudo mkdir -p /var/log/cassandra; sudo chown -R  $USER:$GROUP /var/log/cassandra &&
      sudo mkdir -p /var/lib/dsefs; sudo chown -R  $USER:$GROUP /var/lib/dsefs &&
    • To define your own data and logging directory locations:
      1. In the installation_location, make the directories for data and logging directories. For example:
        mkdir installation_location/dse-data &&
          cd dse-data &&
          mkdir data &&
          mkdir commitlog &&
          mkdir saved_caches &&
          mkdir hints &&
          mkdir cdc_raw
      2. Go the directory containing the cassandra.yaml file:
        cd installation_location/resources/cassandra/conf 
      3. Edit the following lines in the cassandra.yaml file:
        data_file_directories: full_path_to_installation_location/dse-data/data
        commitlog_directory: full_path_to_installation_location/dse-data/commitlog
        saved_caches_directory: full_path_to_installation_location/dse-data/saved_caches
        hints_directory: full_path_to_installation_location/dse-data/hints
        cdc_raw_directory: full_path_to_installation_location/cdc/raw
    • Optional: Define your own Spark directories:
      1. Make the directories for the Spark lib and log directories.
      2. Edit the spark-env.sh file to match the locations of your Spark lib and log directories, as described in Configuring Spark nodes.
      3. Make a directory for the DSEFS data directory and set its location in dsefs_options.
  4. You can use either the default data and logging directory locations or define your locations:
    • Default directory locations: If you want to use the default data and logging directory locations, create and change ownership for the following:
      • /var/lib/cassandra
      • /var/log/cassandra
      sudo mkdir -p /var/lib/cassandra; sudo chown -R $USER:$GROUP /var/lib/cassandra &&
        sudo mkdir -p /var/log/cassandra; sudo chown -R $USER:$GROUP /var/log/cassandra
    • Define your own directory locations: If you want to define your own data and logging directory locations:
      1. In the installation_location, make the directories for data and logging directories. For example:
        mkdir dse-data; chown -R $USER:$GROUP dse-data &&
          cd dse-data && 
          mkdir commitlog; chown -R $USER:$GROUP commitlog && 
          mkdir saved_caches; chown -R $USER:$GROUP saved_caches &&
          mkdir hints; chown -R $USER:$GROUP hints && 
          mkdir cdc_raw; chown -R $USER:$GROUP cdc_raw
      2. Go the directory containing the cassandra.yaml file:
        cd installation_location/resources/cassandra/conf
      3. Update the following lines in the cassandra.yaml file to match the custom locations:
        data_file_directories: 
                - full_path_to_installation_location/dse-data
        commitlog_directory: full_path_to_installation_location/dse-data/commitlog
        saved_caches_directory: full_path_to_installation_location/dse-data/saved_caches
        hints_directory: full_path_to_installation_location/dse-data/hints
        cdc_raw_directory: full_path_to_installation_location/cdc_raw
  5. Optional: If using DSE analytics, you can use either the default Spark data and logging directory locations or define your locations:
    • Default directory locations: If you want to use the default Spark directory locations, create and change ownership for the following:
      • /var/lib/dsefs
      • /var/lib/spark
      • /var/log/spark
      sudo mkdir -p /var/lib/dsefs; sudo chown -R $USER:$GROUP /var/lib/dsefs && 
        sudo mkdir -p /var/lib/spark; sudo chown -R $USER:$GROUP /var/lib/spark && 
        sudo mkdir -p /var/log/spark; sudo chown -R $USER:$GROUP /var/log/spark &&
        sudo mkdir -p /var/lib/spark/rdd; sudo chown -R $USER:$GROUP /var/lib/spark/rdd  &&
        sudo mkdir -p /var/log/spark/master; sudo chown -R $USER:$GROUP /var/log/spark/master  &&
        sudo mkdir -p /var/log/spark/alwayson_sql; sudo chown -R $USER:$GROUP /var/log/spark/alwayson_sql  &&
        sudo mkdir -p /var/lib/spark/worker; sudo chown -R $USER:$GROUP /var/lib/spark/worker
    • Define your own directory locations: If you want to define your own Spark directory locations:
      1. In the installation_location, make the directories for data and logging directories. For example:
        mkdir dsefs; chown -R $USER:$GROUP dsefs &&
          mkdir spark; chown -R $USER:$GROUP spark &&  
          cd spark && 
          mkdir log; chown -R $USER:$GROUP log &&
          mkdir rdd; chown -R $USER:$GROUP rdd && 
          mkdir worker; chown -R $USER:$GROUP worker &&
          cd log &&
          mkdir worker; chown -R $USER:$GROUP worker &&
          mkdir master; chown -R $USER:$GROUP master &&
          mkdir alwayson_sql; chown -R $USER:$GROUP alwayson_sql
      2. Go the directory containing the spark-env.sh file:
        cd installation_location/resources/spark/conf
      3. Uncomment and update the following lines in the spark-env.sh file:
        export SPARK_WORKER_DIR="full_path_to_installation_location/spark/worker"
        export SPARK_EXECUTOR_DIRS="full_path_to_installation_location/spark/rdd"
        export SPARK_WORKER_LOG_DIR="full_path_to_installation_location/spark/log/worker"
        export SPARK_MASTER_LOG_DIR="full_path_to_installation_location/spark/log/master"
        export ALWAYSON_SQL_LOG_DIR="full_path_to_installation_location/spark/log/alwayson_sql"
      4. Go to the directory containing the dsefs_options file:
        cd installation_location/resources/dse/conf
      5. Uncomment and update the DSEFS directory in dse.yaml:
        work_dir: full_path_to_installation_location/dsefs

    Result

    DataStax Enterprise is ready for additional configuration:

    • For production, be sure to change the cassandra user. Failing to do so is a security risk. See Creating superuser accounts.
    • DataStax Enterprise provides several types of workloads (default is transactional). See startup options for service or stand-alone installations.
    • What's next below provides links to related tasks and information.
  6. Optional: Single-node cluster installations only:
    1. Start DataStax Enterprise from the installation directory:
      bin/dse cassandra
      Note: For other start options, see Starting DataStax Enterprise as a stand-alone process.
    2. From the installation directory, verify that DataStax Enterprise is running:
      bin/nodetool status
      Results using vnodes:
      Datacenter: Cassandra
      =====================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address    Load       Tokens  Owns    Host ID                               Rack
      UN  127.0.0.1  82.43 KB   128     ?       40725dc8-7843-43ae-9c98-7c532b1f517e  rack1
      Results not using vnodes:
      Datacenter: Analytics
      =====================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address         Load       Owns    Host ID                               Token                 Rack
      UN  172.16.222.136  103.24 KB  ?       3c1d0657-0990-4f78-a3c0-3e0c37fc3a06  1647352612226902707   rack1

What's next