Important information for deploying a production Cassandra cluster on Amazon EC2.
Before planning an Amazon EC2 cluster, read Amazon EC2 - Virtual Server Hosting.
Use only AMIs for supported platforms and from a trusted source. Random AMIs pose a security risk and may perform slower than expected due to the way the EC2 install is configured. The following are examples of trusted AMIs:
It is best practice to use the same platform on all nodes. If your cluster was instantiated using the DataStax AMI, use Ubuntu for the additional nodes. Configure the cluster as a multiple datacenter cluster using the Ec2MultiRegionSnitch.
For production clusters on EC2, use these guidelines for choosing the instance types:
- Development and light production: m3.large
- Low to moderate production: m3.xlarge
- SSD production with light data: c3.2xlarge
- Largest heavy production: m3.2xlarge (PV) or i3.2xlarge (HVM)
- Micro, small, and medium types are not supported.
Amazon EC2 Instance Types provides additional and up-to-date information on the capabilities and differences in instance types.
EBS magnetic volumes are not recommended for Cassandra data storage volumes for the following reasons:
- EBS magnetic volumes contend directly for network throughput with standard packets. This contention means that EBS throughput is likely to fail when a network link is saturated.
- EBS magnetic volumes have unreliable performance. I/O performance can be exceptionally slow, causing the system to back load reads and writes until the entire cluster becomes unresponsive.
- Adding capacity by increasing the number of EBS volumes per host does not scale. You can easily surpass the ability of the system to keep effective buffer caches and concurrently serve requests for all of the data it is responsible for managing.
For more information and graphs related to ephemeral versus EBS performance, see Systematic Look at EC2 I/O.
To ensure high disk performance to mounted drives, it is recommended that you pre-warm your drives by writing once to every drive location before production use. Depending on EC2 conditions, you can get moderate to enormous increases in throughput. See Optimizing Disk Performance in the Amazon Elastic Compute Cloud Documentation.
Other resources for Amazon EC2 deployments
cassandra.yamlfile. Addition information is available in the Handling Disk Failures In Cassandra 1.2 blog and Recovering using JBOD.