Create a table
To create a table, you must define the table name, column names, column data types, and a primary key. The cqlsh
command-line tool is the primary tool for interacting with CQL to define schema.
Add the optional WITH
clause and keyword arguments to configure table properties (caching, compaction, etc.) as described in the table options.
Prerequisites
-
Understand the use and importance of the primary key in table schema.
-
Determine the data types that each column will require.
-
Start
cqlsh
.
Keyspace and table name conventions: The name of a keyspace or a table is a string of alphanumeric characters and underscores, but it must begin with a letter.
If case must be maintained, the name must be encased in double quotes, such as Since tables are defined within a keyspace, you can either use the keyspace as part of the table creation command, or create a table in the current keyspace.
To specify the keyspace as part of a table name, use the keyspace name, a period ( |
Simple table - single partition key in current keyspace
To create a table in the current keyspace, you can specify just the new table name:
CREATE TABLE cycling.cyclist_name (
id UUID PRIMARY KEY,
lastname text,
firstname text
);
Results
CREATE TABLE cycling.cyclist_name (
id uuid PRIMARY KEY,
firstname text,
lastname text
) WITH additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
In this example, first the cycling
keyspace is selected, and then a table is created with three columns, the first of which is defined as a single PRIMARY KEY
.
That column, id
, is a single partition key.
Simple table - single partition key in another keyspace
To create a table in another keyspace, you can specify both the keyspace and new table name:
CREATE TABLE cycling.cyclist_name (
id UUID PRIMARY KEY,
lastname text,
firstname text
);
Results
CREATE TABLE cycling.cyclist_name (
id uuid PRIMARY KEY,
firstname text,
lastname text
) WITH additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
The difference between the last example and this one is just that the keyspace name is used in the table creation.
Table with primary key separately specified
To define the PRIMARY KEY
with one or more columns, add an additional item to the table creation:
CREATE TABLE cycling.cyclist_name (
id UUID,
lastname text,
firstname text,
PRIMARY KEY (id)
);
Results
CREATE TABLE cycling.cyclist_name (
id uuid PRIMARY KEY,
firstname text,
lastname text
) WITH additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
The difference between the last example and this one is just that the PRIMARY KEY
is specified separately.
Table with name of mixed case
To name a table with a mixed case, use double quotes:
CREATE TABLE cycling."addThis" (
a int PRIMARY KEY
);
Results
CREATE TABLE cycling.addthis (
a int PRIMARY KEY
) WITH additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
Note the double quotes ("…"
) around the table name, to ensure the table is created with the mixed case name.
Unless you need keyspaces or tables with uppercase or mixed case names, simplify by using lowercase only. If your keyspace or table creation commands use uppercase without double quotes, the name will be created with lowercase only. |
Table - multi-column (composite) partition key
To define the PRIMARY KEY
with a partition key of one or more columns:
CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
race_year int,
race_name text,
cyclist_name text,
rank int,
PRIMARY KEY ((race_year, race_name), rank)
);
Results
CREATE TABLE cycling.rank_by_year_and_name (
race_year int,
race_name text,
rank int,
cyclist_name text,
PRIMARY KEY ((race_year, race_name), rank)
) WITH CLUSTERING ORDER BY (rank ASC)
AND additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
In this example, race_year
and race_name
are the two columns that comprise the composite partition key. They are encased in parentheses, inside the PRIMARY KEY
definition. The column rank
is an additional column that is a clustering column, as the next section describes.
Table - clustering columns
The last example showed one example of using a clustering column. Clustering columns in a primary key are all columns listed that are not part of the partition key. The partition key, as shown above, is either the first column defined in the primary key, or a composite partition key encased in parentheses.
Here is another example to make clear the parts of a primary key:
CREATE TABLE IF NOT EXISTS cycling.cyclist_category (
category text,
points int,
id UUID,
lastname text,
PRIMARY KEY (category, points)
)
WITH CLUSTERING ORDER BY (points DESC);
Results
CREATE TABLE cycling.cyclist_category (
category text,
points int,
id uuid,
lastname text,
PRIMARY KEY (category, points)
) WITH CLUSTERING ORDER BY (points DESC)
AND additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
In this example, The category
column is the partition key, and the points
column is a clustering column.
It is important to remember that the primary key must uniquely identify a row within the cyclist_category
table.
More than one row with the same category
can exist, as long as that row contains a different points
value.
All the rows with the same category
will sort by the points
values in descending order due to the addition of the table option WITH CLUSTER ORDER BY
.
Normally, columns are sorted in ascending alphabetical order.
Table with a frozen UDT
Create the race_winners
table that has a frozen user-defined type (UDT):
CREATE TABLE IF NOT EXISTS cycling.race_winners (
cyclist_name FROZEN<fullname>,
race_name text,
race_position int,
PRIMARY KEY (race_name, race_position)
);
Results
CREATE TABLE cycling.race_winners (
race_name text,
race_position int,
cyclist_name frozen<fullname>,
PRIMARY KEY (race_name, race_position)
) WITH CLUSTERING ORDER BY (race_position ASC)
AND additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
See Creating a user-defined type for information on creating UDTs. UDTs can be created unfrozen if only non-collection fields are used in the user-defined type creation. If the table is created with an unfrozen UDT, then individual field values can be updated and deleted.
Table with a geospatial type
CREATE TABLE cycling.geospatial (
id text PRIMARY KEY,
point 'PointType',
linestring 'LineStringType'
);
Results
CREATE TABLE cycling.geospatial (
id text PRIMARY KEY,
linestring 'org.apache.cassandra.db.marshal.LineStringType',
point 'org.apache.cassandra.db.marshal.PointType'
) WITH additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
Table with a CDC log
Create a change data capture log for the cyclist_id
table:
CREATE TABLE IF NOT EXISTS cycling.cyclist_id (
lastname text,
firstname text,
age int,
id UUID,
PRIMARY KEY ((lastname, firstname), age)
);
Results
CREATE TABLE cycling.cyclist_id (
lastname text,
firstname text,
age int,
id uuid,
PRIMARY KEY ((lastname, firstname), age)
) WITH CLUSTERING ORDER BY (age ASC)
AND additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
CDC logging must be enabled in cassandra.yaml.
Before enabling CDC logging, have a plan for moving and consuming the log information. After the disk space limit is reached, writes to CDC-enabled tables are rejected until more space is freed. See Change-data-capture (CDC) space settings for information about available CDC settings. |
Storing data in descending order
The following example shows a table definition that stores the categories with the highest points first.
CREATE TABLE IF NOT EXISTS cycling.cyclist_category (
category text,
points int,
id UUID,
lastname text,
PRIMARY KEY (category, points)
)
WITH CLUSTERING ORDER BY (points DESC);
Table ID for commit log replay to restore table data
Recreate a table with its original ID to facilitate restoring table data by replaying commit logs:
CREATE TABLE IF NOT EXISTS cycling.cyclist_emails (
userid text PRIMARY KEY,
id UUID,
emails set<text>
)
WITH ID = '1bb7516e-b140-11e8-96f8-529269fb1459';
Results
CREATE TABLE cycling.cyclist_emails (
userid text PRIMARY KEY,
emails set<text>,
id uuid
) WITH additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled': 'true', 'incremental': 'true'}
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
To retrieve a table’s ID, query the id
column of system_schema.tables
.
For example:
SELECT id FROM system_schema.tables
WHERE keyspace_name = 'cycling'
AND table_name = 'cyclist_emails';
Results
id
--------------------------------------
1bb7516e-b140-11e8-96f8-529269fb1459
(1 rows)
To perform a point-in-time restoration of the table, see Restoring a backup to a specific point-in-time.
Related information: