Installing a DataStax Enterprise cluster on Amazon EC2

Installing a DataStax Enterprise cluster on Amazon EC2.

This is a step-by-step guide to using the Amazon Web Services EC2 Management Console to set up a DataStax Enterprise cluster using the DataStax AMI (Amazon Machine Image). Installing DataStax Enterprise with the AMI allows you to quickly deploy a cluster with a pre-configured mixed workload. When you launch the AMI, you can specify the total number of nodes in your cluster and how many nodes should be Cassandra (transactional), DSE Analytics (Hadoop and Spark), or DSE Search (Solr).

You can also launch a single node using the DataStax AMI and then use OpsCenter to create a cluster.

Note: Because Amazon changes the EC2 console without notice, there might be some differences in the user interface and options. For details, see the Amazon EC2 documentation.

For information about upgrading or expanding an existing installation, see Upgrading the DataStax AMI or Expanding a DataStax AMI cluster.

The DataStax AMI does the following:

  • Installs the latest version of DataStax Enterprise with an Ubuntu 12.04 LTS (Precise Pangolin), image (Ubuntu Cloud 20140227 release), Kernel 3.8+.
  • Installs Oracle Java 7.
  • Installs metrics tools such as dstat, ethtool, make, gcc, and s3cmd.
  • Uses RAID0 ephemeral disks for data storage and commit logs.
  • Provides a choice of virtualization types: PV (paravirtualization) or HVM (hardware-assisted virtual machine).
  • Launches EBS-backed instances for faster start-up, not database storage.
  • Uses the private interface for intra-cluster communication.
  • Starts the nodes in the specified type: Cassandra (transactional), DSE Analytics, or DSE Search.
  • Sets the seed nodes cluster-wide.
  • Installs DataStax OpsCenter on the first node in the cluster (by default).
Note: The DataStax AMI does not install DataStax Enterprise nodes with virtual nodes enabled.

EC2 clusters spanning multiple regions and availability zones 

The DataStax AMI is intended for a single region and availability zone. When creating an EC2 cluster that spans multiple regions and availability zones, use OpsCenter to set up your cluster instead. You can use any of the supported platforms. It is best practice to use the same platform on all nodes. If your cluster was instantiated using the DataStax AMI, use Ubuntu for the additional nodes. The following topics describe OpsCenter provisioning:

Production considerations 

For production Cassandra clusters on EC2, Production deployment planning. RAID0 the ephemeral disks, and put both the data directory and the commit log on that volume. This has proved to be better in practice than putting the commit log on the root volume (which is also a shared resource). For more data redundancy, consider deploying your Cassandra cluster across multiple availability zones or using OpsCenter to backup to S3. Also see .

Note: DSE Analytics and DSE Search nodes require their own nodes/disks and have specific hardware requirements. See Capacity Planning in the DataStax Enterprise Reference Architecture and the Hadoop and Solr documentation.