Installing a Cassandra cluster on Amazon EC2

A step-by-step guide for installing the DataStax Community AMI (Amazon Machine Image).

The DataStax Community AMI allows you to set up a simple DataStax Community cluster using the Amazon Web Services EC2 Management Console. Installing via the AMI allows you to quickly deploy a Cassandra 2.1 cluster within a single availability zone. To install a later version of Cassandra, create an instance of any supported platform using an AMI from a trusted source. Then use the appropriate install method for that platform.

You can use OpsCenter to simplify setting up a cluster:

The DataStax Community AMI does the following: 

  • Installs Cassandra 2.1 with an Ubuntu 12.04 LTS (Precise Pangolin), image (Ubuntu Cloud 20140227 release), Kernel 3.8+.
  • Installs Oracle Java 7.
  • Install metrics tools such as dstat, ethtool, make, gcc, and s3cmd.
  • Uses RAID0 ephemeral disks for data storage and commit logs.
  • Choice of PV (Para-virtualization) or HVM (Hardware-assisted Virtual Machine) instance types. See Amazon documentation.
  • Launches EBS-backed instances for faster start-up, not database storage.
  • Uses the private interface for intra-cluster communication.
  • Sets the seed nodes cluster-wide.
  • Installs OpsCenter (by default).
Note: The DataStax AMI does not install DataStax Enterprise nodes with virtual nodes enabled.

To install a Cassandra cluster from the DataStax AMI, complete the following tasks:

Production considerations 

For production Cassandra clusters on EC2, see Planning an Amazon EC2 cluster. RAID0 the ephemeral disks, and put both the data directory and the commit log on that volume. This has proved to be better in practice than putting the commit log on the root volume (which is a shared resource). For more data redundancy, consider deploying your cluster across multiple availability zones or using OpsCenter to backup to S3.