CREATE KEYSPACE

Define a new keyspace and its replica placement strategy.

Define a new keyspace and its replica placement strategy.

Synopsis

CREATE  KEYSPACE | SCHEMA  IF NOT EXISTS keyspace_name 
WITH REPLICATION = map
AND DURABLE_WRITES =  true | false 

map is a map collection, a JSON-style array of literals:

{ literal : literal, literal : literal ... }
Table 1. Legend
  • Uppercase means literal
  • Lowercase means not literal
  • Italics mean optional
  • The pipe (|) symbol means OR or AND/OR
  • Ellipsis (...) means repeatable

A semicolon that terminates CQL statements is not included in the synopsis.

Description

CREATE KEYSPACE creates a top-level namespace and sets the keyspace name, replica placement strategy class, replication factor, and DURABLE_WRITES options for the keyspace. For information about the replica placement strategy, see Apache Cassandra 2.1 replica placement strategy.

When you configure NetworkTopologyStrategy as the replication strategy, you set up one or more virtual data centers. Alternatively, you use the default data center. Use the same names for data centers as those used by the snitch. For information about the snitch, see Apache Cassandra 2.1 snitch documentation.

You assign different nodes, depending on the type of workload, to separate data centers. For example, assign Hadoop nodes to one data center and Cassandra real-time nodes to another. Segregating workloads ensures that only one type of workload is active per data center. The segregation prevents incompatibility problems between workloads, such as different batch requirements that affect performance.

A map of properties and values defines the two different types of keyspaces:

{ 'class' : 'SimpleStrategy', 'replication_factor' : <integer> };

{ 'class' : 'NetworkTopologyStrategy'[, '<data center>' : <integer>, '<data center>' : <integer>] . . . };
Table 2. Table of map properties and values
Property Value Value Description
'class' 'SimpleStrategy' or 'NetworkTopologyStrategy' Required. The name of the replica placement strategy class for the new keyspace.
'replication_factor' <number of replicas> Required if class is SimpleStrategy; otherwise, not used. The number of replicas of data on multiple nodes.
'<first data center>' <number of replicas> Required if class is NetworkTopologyStrategy and you provide the name of the first data center. This value is the number of replicas of data on each node in the first data center. Example
'<next data center>' <number of replicas> Required if class is NetworkTopologyStrategy and you provide the name of the second data center. The value is the number of replicas of data on each node in the data center.
. . . . . . More replication factors for optional named data centers.

CQL property map keys must be lower case. For example, class and replication_factor are correct. Keyspace names are 48 or fewer alpha-numeric characters and underscores, the first of which is an alpha character. Keyspace names are case-insensitive. To make a name case-sensitive, enclose it in double quotation marks.

You can use the alias CREATE SCHEMA instead of CREATE KEYSPACE. Attempting to create an already existing keyspace will return an error unless the IF NOT EXISTS option is used. If the option is used, the statement will be a no-op if the keyspace already exists.

Example of setting the SimpleStrategy class

To construct the CREATE KEYSPACE statement, first declare the name of the keyspace, followed by the WITH REPLICATION keywords and the equals symbol. The name of the keyspace is case insensitive unless enclosed in double quotation marks. Next, to create a keyspace that is not optimized for multiple data centers, use SimpleStrategy for the class value in the map. Set replication_factor properties, separated by a colon and enclosed in curly brackets. For example:

CREATE KEYSPACE Excelsior
  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

Using SimpleStrategy is fine for evaluating Cassandra. For production use or for use with mixed workloads, use NetworkTopologyStrategy.

Example of setting the NetworkToplogyStrategy class

Using NetworkTopologyStrategy is also fine for evaluating Cassandra. To use NetworkTopologyStrategy for evaluation purposes using, for example, a single node cluster, specify the default data center name of the cluster. To determine the default data center name, use nodetool status.

$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID                               Rack
UN  127.0.0.1  46.59 KB   256     100.0%  dd867d15-6536-4922-b574-e22e75e46432  rack1
Cassandra uses datacenter1 as the default data center name. Create a keyspace named NTSkeyspace on a single node cluster, for example:
CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };

To use NetworkTopologyStrategy with data centers in a production environment, you need to change the default snitch, SimpleSnitch, to a network-aware snitch, define one or more data center names in the snitch properties file, and use those data center name(s) to define the keyspace; otherwise, Cassandra will fail to find a node, to complete a write request, such as inserting data into a table.

After configuring Cassandra to use a network-aware snitch, such as the PropertyFileSnitch, you define data center and rack names in the cassandra-topology.properties file.

Construct the CREATE KEYSPACE statement using NetworkTopologyStrategy for the class value in the map. Set one or more key-value pairs consisting of the data center name and number of replicas per data center, separated by a colon and enclosed in curly brackets. For example:

CREATE KEYSPACE "Excalibur"
  WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};

This example sets three replicas for a data center named dc1 and two replicas for a data center named dc2. The data center name you use depends on the cluster-configured snitch you are using. There is a correlation between the data center name defined in the map and the data center name as recognized by the snitch you are using. The nodetool status command prints out data center names and rack locations of your nodes if you are not sure what they are.

Setting DURABLE_WRITES

You can set the DURABLE_WRITES option after the map specification of the CREATE KEYSPACE command. When set to false, data written to the keyspace bypasses the commit log. Be careful using this option because you risk losing data. Do not set this attribute on a keyspace using the SimpleStrategy.

CREATE KEYSPACE Risky
  WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
  'datacenter1' : 3 } AND DURABLE_WRITES = false;

Checking created keyspaces

Check that the keyspaces were created:

SELECT * FROM system.schema_keyspaces;

        
keyspace_name  | durable_writes | strategy_class                                       | strategy_options
---------------+----------------+------------------------------------------------------+----------------------------
     excelsior |           True |          org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"3"}
     Excalibur |           True | org.apache.cassandra.locator.NetworkTopologyStrategy |      {"dc2":"2","dc1":"3"}
         risky |          False | org.apache.cassandra.locator.NetworkTopologyStrategy |        {"datacenter1":"1"}
        system |           True |           org.apache.cassandra.locator.LocalStrategy |                         {}
 system_traces |           True |          org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
      
(5 rows)

Cassandra converted the excelsior keyspace to lowercase because quotation marks were not used to create the keyspace and retained the initial capital letter for the Excalibur because quotation marks were used.