Keyspace concepts

A keyspace is the top level database object that controls the replication for the objects it contains at each datacenter in the cluster. A keyspace is analogous to an SQL database.

Keyspaces contain tables, materialized views, as well as user-defined types, functions, and aggregates. Typically, a cluster has one keyspace per application. Because replication is controlled on a per-keyspace basis, store data with different replication requirements (at the same datacenter) in different keyspaces. Keyspaces are not a significant map layer within the data model.

There are two items that must be defined for a keyspace, replication strategy and replication factor, that are defined in a replication map.

Replication strategy

Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The replication strategy is a required attribute of a keyspace.

Three replication strategy classes exist, of which two are available to users:

SimpleStrategy: Use only for a single datacenter and one rack. If you ever intend to assign more than one datacenter, use the NetworkTopologyStrategy. It is possible to alter the keyspace later and switch the replication strategy from SimpleStrategy to NetworkTopologyStrategy. Good for evaluation or development of simple data models and small datasets.
NetworkTopologyStrategy (NTS): Highly recommended for most deployments; allows future expansion to multiple datacenters when required. Can be used for both development and production, or working with mixed workloads. Generally speaking, you should never alter a keyspace to change NTS to SimpleStrategy.
EveryWhereStrategy: A specialized enterprise strategy used by the dse_system keyspace and not intended for customer use.

When you create or modify a keyspace, specify a replication strategy for replicating keyspaces.

Should you choose to set NetworkTopologyStrategy for evaluation purposes, you must change the default snitch, SimpleSnitch, to a network-aware snitch. Choose a snitch and define one or more datacenter names in the snitch properties file, and then use the datacenter name(s) to set the keyspace replication factor. For example, if the cluster uses:

the GossipingPropertyFileSnitch - create the keyspace using the user-defined datacenter and rack names in the cassandra-rackdc.properties file.
the Amazon EC2 single-region snitch - create the keyspace using EC2 datacenter and rack names.
the Google Cloud Platform snitch - create the keyspace using GoogleCloud datacenter and rack names.

Replication factor

The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row in the cluster. If the node containing the row goes down, the row cannot be retrieved. A replication factor of 2 means two copies of each row, where each copy is on a different node.

All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.

Keyspace concepts

Replication strategy

Replication factor

Was this helpful?

Give Feedback