Installing DataStax Enterprise 5.1 using the binary tarball

Instructions for installing DataStax Enterprise 5.1 on any supported Linux-based platform.

Use these instructions for installing DataStax Enterprise (DSE) 5.1 on Linux-based platforms using a binary tarball.

Some things to know about installing DSE

  • The latest version of DataStax Enterprise 5.1 is 5.1.16.
  • When DSE is installed, it creates a cassandra user in the database. Do not use the cassandra user in production. See Creating superuser accounts.
  • When installed from the binary tarball, DataStax Enterprise runs as a stand-alone process.
  • This procedure installs DataStax Enterprise 5.1 and the developer related tools: Javadoc, DataStax Enterprise demos, DataStax Studio, and the DSE Graph Loader.

    It does not install OpsCenter, DataStax Agent, Studio, or Graph Loader. After installing, you must configure and start DataStax Enterprise.

  • After installing, you must configure and start DataStax Enterprise.

spark-env.sh

The default location of the spark-env.sh file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/spark/spark-env.sh

Tarball installations
Installer-No Services installations

installation_location/resources/spark/conf/spark-env.sh

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations
Installer-No Services installations

installation_location/resources/cassandra/conf/cassandra.yaml

Prerequisites

  • A supported platform.
  • Latest build of a Technology Compatibility Kit (TCK) Certified OpenJDK version 8 or Oracle Java SE Runtime Environment 8 (JRE or JDK). Earlier or later versions are not supported.
    Attention: Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8. This change is due to the end of public updates for Oracle JRE/JDK 8. Java 9 is not supported.
  • RedHat-compatible distributions require EPEL (Extra Packages for Enterprise Linux).
  • Python 2.7.x (For older RHEL distributions, see Installing Python 2.7 on older RHEL-based package installations.)

Hardware requirements

See Recommended production settings and the DataStax Enterprise Reference Architecture white paper.

Procedure

Important: End User License Agreement (EULA). By downloading this DataStax product, you agree to the terms of the EULA.

In a terminal window:

  1. Verify that a required version of Java is installed:
    java -version
    Note: DataStax recommends the latest build of a Technology Compatibility Kit (TCK) Certified OpenJDK version 8.

    If not OpenJDK 8 or Oracle Java 8, see Installing supporting software.

    Important:
    • Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8 starting with DSE 5.1.11. This change is due to the end of public updates for Oracle JRE/JDK 8. Java 9 is not supported.
    • Package management tools do not install OpenJDK 8 or Oracle Java.
  2. When installing from the binary tarball, you can either download the tarball and then extract the files, or use curl.
    • Download and extract the tarball specifying the version:
      Note: The latest version is 5.1.16. To view the available versions, see the Release notes.
      1. Download the tarball from Download DataStax Enterprise.
      2. Extract the files:
        tar -xzvf dse-5.1.16-bin.tar.gz
    • Using curl to download and extract the tarball:
      curl -L https://downloads.datastax.com/enterprise/dse-5.1.16-bin.tar.gz | tar xz

    The files are downloaded and extracted into the dse-version directory.

  3. You can use either the default data and logging directory locations or define your locations:
    • To use the default data and logging directory locations, create and change ownership for the following:
      sudo mkdir -p /var/lib/cassandra; sudo chown -R  $USER:$GROUP /var/lib/cassandra &&
      sudo mkdir -p /var/log/cassandra; sudo chown -R  $USER:$GROUP /var/log/cassandra &&
      sudo mkdir -p /var/lib/dsefs; sudo chown -R  $USER:$GROUP /var/lib/dsefs &&
    • To define your own data and logging directory locations:
      1. In the installation_location, make the directories for data and logging directories. For example:
        mkdir installation_location/dse-data &&
          cd dse-data &&
          mkdir commitlog &&
          mkdir saved_caches &&
          mkdir hints &&
          mkdir cdc_raw
      2. Go the directory containing the cassandra.yaml file:
        cd installation_location/resources/cassandra/conf 
      3. Edit the following lines in the cassandra.yaml file:
        data_file_directories: full_path_to_installation_location/dse-data
        commitlog_directory: full_path_to_installation_location/dse-data/commitlog
        saved_caches_directory: full_path_to_installation_location/dse-data/saved_caches
        hints_directory: full_path_to_installation_location/dse-data/hints
        cdc_raw_directory: full_path_to_installation_location/cdc/raw
    • Optional: Define your own Spark directories:
      1. Make the directories for the Spark lib and log directories.
      2. Edit the spark-env.sh file to match the locations of your Spark lib and log directories, as described in Configuring Spark nodes.
      3. Make a directory for the DSEFS data directory and set its location in dsefs_options.

    DataStax Enterprise is ready for additional configuration.

  4. You can use either the default data and logging directory locations or define your locations:
    • Default directory locations: If you want to use the default data and logging directory locations, create and change ownership for the following:
      • /var/lib/cassandra
      • /var/log/cassandra
      sudo mkdir -p /var/lib/cassandra; sudo chown -R $USER:$GROUP /var/lib/cassandra &&
        sudo mkdir -p /var/log/cassandra; sudo chown -R $USER:$GROUP /var/log/cassandra
    • Define your own directory locations: If you want to define your own data and logging directory locations:
      1. In the installation_location, make the directories for data and logging directories. For example:
        mkdir dse-data; chown -R $USER:$GROUP dse-data &&
          cd dse-data && 
          mkdir commitlog; chown -R $USER:$GROUP commitlog && 
          mkdir saved_caches; chown -R $USER:$GROUP saved_caches &&
          mkdir hints; chown -R $USER:$GROUP hints && 
          mkdir cdc_raw; chown -R $USER:$GROUP cdc_raw
      2. Go the directory containing the cassandra.yaml file:
        cd installation_location/resources/cassandra/conf
      3. Update the following lines in the cassandra.yaml file to match the custom locations:
        data_file_directories: 
                - full_path_to_installation_location/dse-data
        commitlog_directory: full_path_to_installation_location/dse-data/commitlog
        saved_caches_directory: full_path_to_installation_location/dse-data/saved_caches
        hints_directory: full_path_to_installation_location/dse-data/hints
        cdc_raw_directory: full_path_to_installation_location/cdc_raw
  5. Optional: If using DSE analytics, you can use either the default Spark data and logging directory locations or define your locations:
    • Default directory locations: If you want to use the default Spark directory locations, create and change ownership for the following:
      • /var/lib/dsefs
      • /var/lib/spark
      • /var/log/spark
      sudo mkdir -p /var/lib/dsefs; sudo chown -R $USER:$GROUP /var/lib/dsefs && 
        sudo mkdir -p /var/lib/spark; sudo chown -R $USER:$GROUP /var/lib/spark && 
        sudo mkdir -p /var/log/spark; sudo chown -R $USER:$GROUP /var/log/spark &&
        sudo mkdir -p /var/lib/spark/rdd; sudo chown -R $USER:$GROUP /var/lib/spark/rdd  &&
        sudo mkdir -p /var/log/spark/master; sudo chown -R $USER:$GROUP /var/log/spark/master  &&
        sudo mkdir -p /var/log/spark/alwayson_sql; sudo chown -R $USER:$GROUP /var/log/spark/alwayson_sql  &&
        sudo mkdir -p /var/lib/spark/worker; sudo chown -R $USER:$GROUP /var/lib/spark/worker
    • Define your own directory locations: If you want to define your own Spark directory locations:
      1. In the installation_location, make the directories for data and logging directories. For example:
        mkdir dsefs; chown -R $USER:$GROUP dsefs &&
          mkdir spark; chown -R $USER:$GROUP spark &&  
          cd spark && 
          mkdir log; chown -R $USER:$GROUP log &&
          mkdir rdd; chown -R $USER:$GROUP rdd && 
          mkdir worker; chown -R $USER:$GROUP worker &&
          cd log &&
          mkdir worker; chown -R $USER:$GROUP worker &&
          mkdir master; chown -R $USER:$GROUP master &&
          mkdir alwayson_sql; chown -R $USER:$GROUP alwayson_sql
      2. Go the directory containing the spark-env.sh file:
        cd installation_location/resources/spark/conf
      3. Uncomment and update the following lines in the spark-env.sh file:
        export SPARK_WORKER_DIR="full_path_to_installation_location/spark/worker"
        export SPARK_EXECUTOR_DIRS="full_path_to_installation_location/spark/rdd"
        export SPARK_WORKER_LOG_DIR="full_path_to_installation_location/spark/log/worker"
        export SPARK_MASTER_LOG_DIR="full_path_to_installation_location/spark/log/master"
        export ALWAYSON_SQL_LOG_DIR="full_path_to_installation_location/spark/log/alwayson_sql"
      4. Go to the directory containing the dsefs_options file:
        cd installation_location/resources/dse/conf
      5. Uncomment and update the DSEFS directory in dse.yaml:
        work_dir: full_path_to_installation_location/dsefs

    DataStax Enterprise is ready for additional configuration.

  6. Optional: Single-node cluster installations only:
    1. Start DataStax Enterprise from the installation directory:
      bin/dse cassandra
      Note: For other start options, see Starting DataStax Enterprise as a stand-alone process.
    2. From the installation directory, verify that DataStax Enterprise is running:
      bin/nodetool status
      Results using vnodes:
      Datacenter: Cassandra
      =====================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address    Load       Tokens  Owns    Host ID                               Rack
      UN  127.0.0.1  82.43 KB   128     ?       40725dc8-7843-43ae-9c98-7c532b1f517e  rack1
      Results not using vnodes:
      Datacenter: Analytics
      =====================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address         Load       Owns    Host ID                               Token                 Rack
      UN  172.16.222.136  103.24 KB  ?       3c1d0657-0990-4f78-a3c0-3e0c37fc3a06  1647352612226902707   rack1

What's next