Keyspace concepts
A keyspace is the top level database object that controls the replication for the objects it contains at each datacenter in the cluster. A keyspace is analogous to an SQL database.
Keyspaces contain tables, materialized views, as well as user-defined types, functions, and aggregates. Typically, a cluster has one keyspace per application. Because replication is controlled on a per-keyspace basis, store data with different replication requirements (at the same datacenter) in different keyspaces. Keyspaces are not a significant map layer within the data model.
There are two items that must be defined for a keyspace, replication strategy and replication factor, that are defined in a replication map.
Replication strategy
Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The replication strategy is a required attribute of a keyspace.
Three replication strategy classes exist, of which two are available to users:
SimpleStrategy
-
Use only for a single datacenter and one rack. If you ever intend to assign more than one datacenter, use the
NetworkTopologyStrategy
. It is possible to alter the keyspace later and switch the replication strategy fromSimpleStrategy
toNetworkTopologyStrategy
. Good for evaluation or development of simple data models and small datasets. NetworkTopologyStrategy (NTS)
-
Highly recommended for most deployments; allows future expansion to multiple datacenters when required. Can be used for both development and production, or working with mixed workloads. Generally speaking, you should never alter a keyspace to change NTS to
SimpleStrategy
. EveryWhereStrategy
-
A specialized enterprise strategy used by the
dse_system
keyspace and not intended for customer use.
When you create or modify a keyspace, specify a replication strategy for replicating keyspaces.
Should you choose to set
|
Replication factor
The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row in the cluster. If the node containing the row goes down, the row cannot be retrieved. A replication factor of 2 means two copies of each row, where each copy is on a different node.
All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.