Create a table

To create a table, you must define the table name, column names, column data types, and a primary key. The cqlsh command-line tool is the primary tool for interacting with CQL to define schema.

Add the optional WITH clause and keyword arguments to configure table properties (caching, compaction, etc.) as described in the table options.

Dynamic schema generation and schema collision

It is important to note that dynamic schema generation is not supported. Schema collisions can occur if multiple clients attempt to generate tables simultaneously.

If a schema collision occurs, do the following:

Run a rolling restart on all nodes to ensure the schema matches. Run nodetool describecluster on all nodes. Ensure that there is only one schema version.
On each node, examine the data directory for the table to fix. The default location for the data directory is /var/lib/cassandra/data. If there is only one directory for the table, go to the next node and repeat this step. If there are two or more directories for the table, continue to the next step.

If there are two directories, the older directory contains the old table data before the update. The newer directory contains the new data after the update.

Use CQL to find the table’s newest id in system_schema.tables:

SELECT id, keyspace_name, table_name FROM system_schema.tables
  WHERE keyspace_name = 'cycling' AND table_name = 'cyclist_name';

Result

 id                                   | keyspace_name | table_name
--------------------------------------+---------------+--------------
 c2784ad1-065d-11ea-bd8d-9f9b8a53b5f5 |       cycling | cyclist_name

(1 rows)

Move the data from the older table to the newer table’s directory, and then remove the older directory. Repeat this step as needed.
Run nodetool refresh.

Prerequisites

Create a keyspace
Understand the use and importance of the primary key in table schema.
Determine the data types that each column will require.
Start cqlsh.

Keyspace and table name conventions

The name of a keyspace or a table is a string of alphanumeric characters and underscores, but it must begin with a letter. If case must be maintained, the name must be encased in double quotes, such as "MyTable".

Since tables are defined within a keyspace, you can either use the keyspace as part of the table creation command, or create a table in the current keyspace. To specify the keyspace as part of a table name, use the keyspace name, a period (.), and table name, such as cycling.cyclist-stats.

Simple table with single partition key in current keyspace

To create a table in the current keyspace, you can specify just the new table name:

CREATE TABLE cycling.cyclist_name (
  id UUID PRIMARY KEY,
  lastname text,
  firstname text
);

Result

CREATE TABLE cycling.cyclist_name (
    id uuid PRIMARY KEY,
    firstname text,
    lastname text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

In this example, first the cycling keyspace is selected, and then a table is created with three columns, the first of which is defined as a single PRIMARY KEY. That column, id, is a single partition key.

Simple table with single partition key in another keyspace

To create a table in another keyspace, you can specify both the keyspace and new table name:

CREATE TABLE cycling.cyclist_name (
  id UUID PRIMARY KEY,
  lastname text,
  firstname text
);

Result

CREATE TABLE cycling.cyclist_name (
    id uuid PRIMARY KEY,
    firstname text,
    lastname text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

The difference between the last example and this one is just that the keyspace name is used in the table creation.

Table with primary key separately specified

To define the PRIMARY KEY with one or more columns, add an additional item to the table creation:

CREATE TABLE cycling.cyclist_name (
  id UUID,
  lastname text,
  firstname text,
  PRIMARY KEY (id)
);

Result

CREATE TABLE cycling.cyclist_name (
    id uuid PRIMARY KEY,
    firstname text,
    lastname text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

The difference between the last example and this one is just that the PRIMARY KEY is specified separately.

Table with name of mixed case

To name a table with a mixed case, use double quotes:

CREATE TABLE cycling."addThis" (
  a int PRIMARY KEY
);

Result

CREATE TABLE cycling.addthis (
    a int PRIMARY KEY
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

The double quotes ("…") around the table name ensure that the table is created with the mixed case name.

Unless you need keyspaces or tables with uppercase or mixed case names, DataStax recommends using lowercase only. If your keyspace or table creation commands use uppercase without double quotes, the name is normalized to lowercase.

Table with multi-column (composite) partition key

To define the PRIMARY KEY with a partition key of one or more columns:

CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
  race_year int,
  race_name text,
  cyclist_name text,
  rank int,
  PRIMARY KEY ((race_year, race_name), rank)
);

Result

CREATE TABLE cycling.rank_by_year_and_name (
    race_year int,
    race_name text,
    rank int,
    cyclist_name text,
    PRIMARY KEY ((race_year, race_name), rank)
) WITH CLUSTERING ORDER BY (rank ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

In this example, race_year and race_name are the two columns that comprise the composite partition key. They are encased in parentheses, inside the PRIMARY KEY definition. The column rank is an additional column that is a clustering column, as the next section describes.

Table with clustering columns

The last example showed one example of using a clustering column. Clustering columns in a primary key are all columns listed that are not part of the partition key. The partition key, as shown above, is either the first column defined in the primary key, or a composite partition key encased in parentheses.

Here is another example to make clear the parts of a primary key:

CREATE TABLE IF NOT EXISTS cycling.cyclist_category (
  category text,
  points int,
  id UUID,
  lastname text,
  PRIMARY KEY (category, points)
)
WITH CLUSTERING ORDER BY (points DESC);

Result

CREATE TABLE cycling.cyclist_category (
    category text,
    points int,
    id uuid,
    lastname text,
    PRIMARY KEY (category, points)
) WITH CLUSTERING ORDER BY (points DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

In this example, The category column is the partition key, and the points column is a clustering column. It is important to remember that the primary key must uniquely identify a row within the cyclist_category table. More than one row with the same category can exist, as long as that row contains a different points value. All the rows with the same category will sort by the points values in descending order due to the addition of the table option WITH CLUSTER ORDER BY. Normally, columns are sorted in ascending alphabetical order.

Table with a frozen UDT

Create the race_winners table that has a frozen user-defined type (UDT):

CREATE TABLE IF NOT EXISTS cycling.race_winners (
  cyclist_name FROZEN<fullname>,
  race_name text,
  race_position int,
  PRIMARY KEY (race_name, race_position)
);

Result

CREATE TABLE cycling.race_winners (
    race_name text,
    race_position int,
    cyclist_name frozen<fullname>,
    PRIMARY KEY (race_name, race_position)
) WITH CLUSTERING ORDER BY (race_position ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

See Creating a user-defined type for information on creating UDTs. UDTs can be created unfrozen if only non-collection fields are used in the user-defined type creation. If the table is created with an unfrozen UDT, then individual field values can be updated and deleted.

Table with a geospatial type

CREATE TABLE cycling.geospatial (
  id text PRIMARY KEY,
  point 'PointType',
  linestring 'LineStringType'
);

Result

CREATE TABLE cycling.geospatial (
    id text PRIMARY KEY,
    linestring 'org.apache.cassandra.db.marshal.LineStringType',
    point 'org.apache.cassandra.db.marshal.PointType'
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Table with a CDC log

CDC logging must be enabled in cassandra.yaml.

Before enabling CDC logging, have a plan for moving and consuming the log information. After the disk space limit is reached, writes to CDC-enabled tables are rejected until more space is freed. See Change-data-capture (CDC) space settings for information about available CDC settings.

Create a change data capture log for the cyclist_id table:

CREATE TABLE IF NOT EXISTS cycling.cyclist_id (
  lastname text,
  firstname text,
  age int,
  id UUID,
  PRIMARY KEY ((lastname, firstname), age)
);

Result

CREATE TABLE cycling.cyclist_id (
    lastname text,
    firstname text,
    age int,
    id uuid,
    PRIMARY KEY ((lastname, firstname), age)
) WITH CLUSTERING ORDER BY (age ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Storing data in descending order

The following example shows a table definition that stores the categories with the highest points first.

CREATE TABLE IF NOT EXISTS cycling.cyclist_category (
  category text,
  points int,
  id UUID,
  lastname text,
  PRIMARY KEY (category, points)
)
WITH CLUSTERING ORDER BY (points DESC);

Table ID for commit log replay to restore table data

Re-create a table with its original ID to facilitate restoring table data by replaying commit logs:

CREATE TABLE IF NOT EXISTS cycling.cyclist_emails (
  userid text PRIMARY KEY,
  id UUID,
  emails set<text>
)
WITH ID = '1bb7516e-b140-11e8-96f8-529269fb1459';

Result

CREATE TABLE cycling.cyclist_emails (
    userid text PRIMARY KEY,
    emails set<text>,
    id uuid
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

To retrieve a table’s ID, query the id column of system_schema.tables. For example:

SELECT id FROM system_schema.tables
WHERE keyspace_name = 'cycling'
  AND table_name = 'cyclist_emails';

Result

 id
--------------------------------------
 1bb7516e-b140-11e8-96f8-529269fb1459

(1 rows)

To perform a point-in-time restoration of a table, see the documentation for DSE 5.1.

Table with COMPACT STORAGE

In DSE 5.1, you can use WITH COMPACT STORAGE to create a table that is compatible with clients written to work with the legacy (Thrift) storage engine format:

CREATE TABLE sblocks (
  block_id uuid,
  subblock_id uuid,
  data blob,
  PRIMARY KEY (block_id, subblock_id)
)
WITH COMPACT STORAGE;

Using the WITH COMPACT STORAGE directive prevents you from defining more than one column that is not part of a compound primary key. A compact table with a primary key that is not compound can have multiple columns that are not part of the primary key.

A compact table that uses a compound primary key must define at least one clustering column. Columns cannot be added nor removed after creation of a compact table. Unless you specify WITH COMPACT STORAGE, CQL creates a table with non-compact storage.

Collections and static columns cannot be used with COMPACT STORAGE tables.

Create a table

Dynamic schema generation and schema collision

Prerequisites

Simple table with single partition key in current keyspace

Simple table with single partition key in another keyspace

Table with primary key separately specified

Table with name of mixed case

Table with multi-column (composite) partition key

Table with clustering columns

Table with a frozen UDT

Table with a geospatial type

Table with a CDC log

Storing data in descending order

Table ID for commit log replay to restore table data

Table with COMPACT STORAGE

See also

Was this helpful?

Give Feedback