Upgrading Cassandra

This document corresponds to an earlier product version. Make sure you are using the version that corresponds to your version.

Latest Cassandra documentation | Earlier Cassandra documentation

This section describes how to upgrade an earlier version of Cassandra or DataStax Community Edition to Cassandra 1.1. This section contains the following topics:

Best Practices for Upgrading Cassandra

The following steps are recommended before upgrading Cassandra:

Upgrading to Cassandra 1.1.9

If you are upgrading to Cassandra 1.1.9 from a version earlier than 1.1.7, all nodes must be upgraded before any streaming can take place. Until you upgrade all nodes, you cannot add version 1.1.7 nodes or later to a 1.1.7 or earlier cluster.

Upgrade Steps for Binary Tarball and Packaged Releases Installations

Upgrading from version 0.8 or later can be done with a rolling restart, one node at a time. You do not need to bring down the whole cluster at once.

To upgrade a Binary Tarball Installation

  1. Save the cassandra.yaml file from the old installation to a safe place.
  2. On each node, download and unpack the binary tarball package from the downloads section of the Cassandra website.
  3. In the new installation, open the cassandra.yaml for writing.
  4. In the old installation, open the cassandra.yaml.
  5. Diff the new and old cassandra.yaml files.
  6. Merge the diffs by hand from the old file into the new one.
  7. Follow steps for completing the upgrade.

To upgrade a RHEL or CentOS Installation

  1. On each of your Cassandra nodes, run sudo yum install apache-cassandra1. The installer creates the file cassandra.yaml.rpmnew in /etc/cassandra/default.conf/.
  2. Open the old and new cassandra.yaml files and diff them.
  3. Merge the diffs by hand from the old file into the new one. Save the file as cassandra.yaml.
  4. Follow steps for completing the upgrade.

To Upgrade a Debian or Ubuntu Installation

  1. Save the cassandra.yaml file from the old installation to a safe place.
  2. On each of your Cassandra nodes, run sudo apt-get install cassandra1.
  3. Open the old and new cassandra.yaml files and diff them.
  4. Merge the diffs by hand from the old file into the new one.
  5. Follow steps for completing the upgrade.

Completing the Upgrade

To complete the upgrade, perform the following steps:

  1. Account for New Parameters between 1.0 and 1.1 in cassandra.yaml.
  2. Make sure any client drivers, such as Hector or Pycassa clients, are compatible with the new version.
  3. Run nodetool drain before shutting down the existing Cassandra service. This will prevent overcounts of counter data, and will also speed up restart post-upgrade.
  4. Stop the old Cassandra process, then start the new binary process.
  5. Monitor the log files for any issues.
  6. If you are upgrading from Cassandra 1.1.3 or earlier to Cassandra 1.1.5 or later, skip steps 7 and 8 of this procedure and go to Completing the upgrade from Cassandra 1.1.3 or earlier to Cassandra 1.1.5 or later.
  7. After upgrading and restarting all Cassandra processes, restart client applications.
  8. After upgrading, run nodetool upgradesstables against each node before running repair, moving nodes, or adding new ones. If you are using Cassandra 1.0.3 and earlier, use nodetool scrub instead of nodetool upgradesstables.

Completing the upgrade from Cassandra 1.1.3 or earlier to Cassandra 1.1.5 or later

If you created column families having the CQL compaction_strategy_class storage option set to LeveledCompactionStrategy, you need to scrub the SSTables that store those column families.

First, upgrade all nodes to the latest Cassandra version, according to the platform-specific instructions presented earlier in this document. Next, complete steps 1-5 of Completing the Upgrade. At this point, all nodes are upgraded and started. Finally, follow these steps to scrub the affected SSTables:

To scrub SSTables:

  1. Shut down the nodes, one-at-a-time.

  2. On each offline node, run the sstablescrub utility, which is located in <install directory>/bin (tarball distributions) or in /usr/bin (packaged distributions). Help for sstablescrub is:

    usage: sstablescrub [options] <keyspace> <column_family>
    --
    Scrub the sstable for the provided column family.
    --
    Options are:
      --debug display stack traces
      -h,--help display this help message
      -m,--manifest-check only check and repair the leveled manifest, without
      actually scrubbing the sstables
      -v,--verbose verbose output
    

    For example, on a tarball installation:

    cd <install directory>/bin
    ./sstablescrub mykeyspace mycolumnfamily
    
  3. Restart each node and client applications, one node at-a-time.

If you do not scrub the affected SSTables, you might encounter the following error during compactions on column families using LeveledCompactionStrategy:

ERROR [CompactionExecutor:150] 2012-07-05 04:26:15,570 AbstractCassandraDaemon.java (line 134)
  Exception in thread Thread[CompactionExecutor:150,1,main]
  java.lang.AssertionError
  at org.apache.cassandra.db.compaction.LeveledManifest.promote
  (LeveledManifest.java:214)

Upgrading Between Minor Releases of Cassandra 1.1.x

The upgrade procedure between minor releases of Cassandra 1.1.x is identical to the upgrade procedure between major releases with one exception: Do not perform the last step of Completing the Upgrade to run nodetool upgradesstables or nodetool scrub after upgrading.

New and Changed Features

The following list provides information about new and changed features in Cassandra 1.1. Also see New Parameters between 1.0 and 1.1.

  • Compression is enabled by default on newly created column families and unchanged for column families created prior to upgrading.

  • If running a multi data-center, you should upgrade to the latest 1.0.x (or 0.8.x) release before upgrading to this version. Versions 0.8.8 and 1.0.3-1.0.5 generate cross-data center forwarding that is incompatible with 1.1.

    Cross-data center forwarding means optimizing cross-data center replication. If DC1 needs to replicate a write to three replicas in DC2, only one message is sent across data centers; one node in DC2 forwards the message to the two other replicas, instead of sending three message across data centers.

  • EACH_QUORUM ConsistencyLevel is only supported for writes and now throws an InvalidRequestException when used for reads. (Previous versions would silently perform a LOCAL_QUORUM read.)

  • ANY ConsistencyLevel is supported only for writes and now throw an InvalidRequestException when used for reads. (Previous versions would silently perform a ONE read for range queries; single-row and multiget reads already rejected ANY.)

  • The largest mutation batch accepted by the commitlog is now 128MB. (In practice, batches larger than ~10MB always caused poor performance due to load volatility and GC promotion failures.) Larger batches will continue to be accepted but are not durable. Consider setting durable_writes=false if you really want to use such large batches.

  • Make sure that global settings: key_cache_{size_in_mb, save_period} and row_cache_{size_in_mb, save_period} in conf/cassandra.yaml configuration file are used instead of per-ColumnFamily options.

  • JMX methods no longer return custom Cassandra objects. JMX methods now return standard Maps, Lists, and so on.

  • Hadoop input and output details are now separated. If you were previously using methods such as getRpcPort you now need to use getInputRpcPort or getOutputRpcPort, depending on the circumstance.

  • CQL changes: Prior to Cassandra 1.1, you could use the CQL 2 keyword, KEY, instead of using the actual name of the primary key column in some select statements. In Cassandra 1.1 and later, you must use the name of the primary key.

  • The sliced_buffer_size_in_kb option has been removed from the cassandra.yaml configuration file (this option was a no-op since 1.0).

New Parameters between 1.0 and 1.1

This table lists parameters in the cassandra.yaml configuration files that have changed between 1.0 and 1.1. See the cassandra.yaml reference for details on these parameters.

Option Default Value
1.1 Release
key_cache_size_in_mb empty
key_cache_save_period 14400 (4 hours)
row_cache_size_in_mb 0 (disabled)
row_cache_save_period 0 (disabled)
1.0 Release (Column Family Attributes)
key_cache_size 2MB (ignored in 1.1)
key_cache_save_period_in_seconds N/A
row_cache_size 0 (ignored in 1.1)
row_cache_save_period_in_seconds N/A