Keyspace concepts

A keyspace is the top level database object that controls the replication for the objects it contains at each datacenter in the cluster. A keyspace is analogous to an SQL database.

Keyspaces contain tables, materialized views, as well as user-defined types, functions, and aggregates. Typically, a cluster has one keyspace per application. Because replication is controlled on a per-keyspace basis, store data with different replication requirements (at the same datacenter) in different keyspaces. Keyspaces are not a significant map layer within the data model.

There are two items that must be defined for a keyspace, replication strategy and replication factor, that are defined in a replication map.

Replication strategy

Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The replication strategy is a required attribute of a keyspace.

Three replication strategy classes exist, of which two are available to users:

SimpleStrategy

Use only for a single datacenter and one rack. If you ever intend to assign more than one datacenter, use the NetworkTopologyStrategy. It is possible to alter the keyspace later and switch the replication strategy from SimpleStrategy to NetworkTopologyStrategy. Good for evaluation or development of simple data models and small datasets.

NetworkTopologyStrategy (NTS)

Highly recommended for most deployments; allows future expansion to multiple datacenters when required. Can be used for both development and production, or working with mixed workloads. Generally speaking, you should never alter a keyspace to change NTS to SimpleStrategy.

EveryWhereStrategy

A specialized enterprise strategy used by the dse_system keyspace and not intended for customer use.

When you create or modify a keyspace, specify a replication strategy for replicating keyspaces.

Should you choose to set NetworkTopologyStrategy for evaluation purposes, you must change the default snitch, SimpleSnitch, to a network-aware snitch. Choose a snitch and define one or more datacenter names in the snitch properties file, and then use the datacenter name(s) to set the keyspace replication factor. For example, if the cluster uses:

Replication factor

The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row in the cluster. If the node containing the row goes down, the row cannot be retrieved. A replication factor of 2 means two copies of each row, where each copy is on a different node.

All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com