CREATE TABLE
Creates a new table.
Creates a new table in the selected keyspace. Use IF NOT EXISTS
suppress the error message if the table already exists, no table is created. A static column can store the same data in multiple clustered
rows of a partition, and then retrieve that data with a single SELECT
statement.
Tables support a single counter column.
Synopsis
CREATE TABLE [ IF NOT EXISTS ] [keyspace_name.]table_name ( column_definition [ , ... ] | PRIMARY KEY (column_list) ) [ WITH [ table_options ] [ [ AND ] CLUSTERING ORDER BY [ clustering_column_name order ] ] [ [ AND ] ID = 'table_hash_tag' ] [ [ AND ] COMPACT STORAGE ] ;
Syntax conventions | Description |
---|---|
UPPERCASE | Literal keyword. |
Lowercase | Not literal. |
Italics |
Variable value. Replace with a user-defined value. |
[] |
Optional. Square brackets ( [] ) surround
optional command arguments. Do not type the square brackets. |
( ) |
Group. Parentheses ( ( ) ) identify a group to
choose from. Do not type the parentheses. |
| |
Or. A vertical bar ( | ) separates alternative
elements. Type any one of the elements. Do not type the vertical
bar. |
... |
Repeatable. An ellipsis ( ... ) indicates that
you can repeat the syntax element as often as required. |
'Literal string' |
Single quotation ( ' ) marks must surround
literal strings in CQL statements. Use single quotation marks to
preserve upper case. |
{ key : value
} |
Map collection. Braces ( { } ) enclose map
collections or key value pairs. A colon separates the key and the
value. |
<datatype1,datatype2> |
Set, list, map, or tuple. Angle brackets ( <
> ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma. |
cql_statement; |
End CQL statement. A semicolon ( ; ) terminates
all CQL statements. |
[--] |
Separate the command line options from the command arguments with
two hyphens ( -- ). This syntax is useful when
arguments might be mistaken for command line options. |
' <schema> ... </schema>
' |
Search CQL only: Single quotation marks ( ' )
surround an entire XML schema declaration. |
@xml_entity='xml_entity_type' |
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
column_definition
Sets the column name, defines the data type, and optionally sets to column to static or primary key.
column_name
cql_type_definition [STATIC | PRIMARY KEY] [, ...]
- When primary key is at the end of a column definition that column is the only primary key for the table.
- A table must have at least one
PRIMARY KEY
. - A static column cannot be a primary key.
- Primary keys can include frozen collections.
- column_name
- Use a unique name for each column in table. To preserve case or use special characters, enclose the name in double-quotes.
- cql_type_definition
- Defines the type of data allowed in the column, see CQL data type or a user-defined type.
- STATIC
- Optional, the column has a single value.
- PRIMARY KEY
-
When the
PRIMARY KEY
is one column, append PRIMARY KEY to the end of the column definition. This is only schema information required to create a table. When there is one primary key, it is the partition key; the data is divided and stored by the unique values in this column.column_name cql_type_definition PRIMARY KEY
.Alternatively, you can declare the primary key consisting of only one column in the same way as you declare a compound primary key.
PRIMARY KEY (column_list)
Uniquely identifies rows, determines storage partitions, and data ordering (clustering) within the partition.
NULL
value cannot be
inserted into a PRIMARY KEY
column. This restriction applies to
both partition keys and clustering columns.- column_list
-
Defines a partition and clustering columns, which affects how the data in stored.
- Compound primary key: the first column is the partition key, and
the additional columns are clustering keys. Syntax:
PRIMARY KEY (partition_column_name, clustering_column_name [, ...])
- Composite partition key: Multiple columns in the partition
key. Enclose the partition key columns in parenthesis. Syntax:
PRIMARY KEY ((partition_column_name[, ...]),clustering_column_name [, ...])
- Compound primary key: the first column is the partition key, and
the additional columns are clustering keys. Syntax:
table_options
CQL table properties and descriptions of the syntax.
cassandra.yaml
- The cassandra.yaml file is located in the installation_location/conf directory.
Tunes data handling, including I/O operations, compression, and compaction. Table property options use the following syntax:
- Single values:
option_name = 'value'
- Multiple values:
option_name = { 'subproperty' : 'value' [, ...] } [AND ...]
Simple JSON format, key-value pairs in a comma-separated list enclosed by curly brackets.
WITH
clause to define table
property options, separate multiple values with AND
, for
example:ALTER TABLE [keyspace_name.]table_name WITH option_name = 'value' AND option_name = {option_map};
- bloom_filter_fp_chance = N
-
False-positive probability for SSTable bloom filter. When a client requests data, the bloom filter checks if the row exists before executing disk I/O. Values range from 0 to 1.0, where:
0
is the minimum value use to enable the largest possible bloom filter (uses the most memory) and1.0
is the maximum value disabling the bloom filter.Tip: Recommended setting:0.1
. A higher value yields diminishing returns.Default:
bloom_filter_fp_chance = '0.01'
- caching = { 'keys' : 'value', 'rows_per_partition' : 'value'}
-
Optimizes the use of cache memory without manual tuning. Weighs the cached data by size and access frequency. Coordinate this setting with the global caching properties in the cassandra.yaml file. Valid values:
ALL
– all primary keys or rowsNONE
– no primary keys or rowsN
: (rows per partition only) – specify a whole number
Default:
{ 'keys': 'ALL', 'rows_per_partition': 'NONE' }
- cdc = TRUE | FALSE
-
Creates a Change Data Capture (CDC) log on the table.
Valid values:TRUE
- create CDC logFALSE
- do not create CDC log
- comments = 'some text that describes the table etc'
-
Provide documentation on the table.Tip: Enter a description of the types of queries the table was designed to satisfy.
- dclocal_read_repair_chance
- Probability that a successful read operation triggers a read repair, between
0
and1
. Unlike the repair controlled by read_repair_chance, this repair is limited to replicas in the same DC as the coordinator.Default value:
0.01
Note: In DataStax Distribution of Apache Cassandra™ (DDAC),dclocal_read_repair_chance
is deprecated and should be set to0.0
. - default_time_to_live
- TTL (Time To Live) in seconds, where zero is disabled. The maximum configurable value
is
630720000
(20 years). Beginning in 2018, the expiration timestamp can exceed the maximum value supported by the storage engine, see the warning below. If the value is greater than zero, TTL is enabled for the entire table and an expiration timestamp is added to each column. A new TTL timestamp is calculated each time the data is updated and the row is removed after all the data expires.Default value:
0
(disabled).Warning: The database storage engine can only encode TTL timestamps throughJanuary 19 2038 03:14:07 UTC
due to the Year 2038 problem. The TTL date overflow policy determines whether requests with expiration timestamps later than the maximum date are rejected or inserted. See -Dcassandra.expiration_date_overflow_policy=POLICY. - gc_grace_seconds
- Seconds after data is marked with a tombstone (deletion marker) before it is eligible
for garbage-collection. Default value: 864000. The default value allows time for the
database to maximize consistency prior to deletion.In a single-node cluster, this property can safely be set to zero. You can also reduce this value for tables whose data is not explicitly deleted — for example, tables containing only data with TTL set, or tables with default_time_to_live set. However, if you lower the gc_grace_seconds value, consider its interaction with these operations:
- hint replays: When a node goes down and then comes back up, other nodes replay the write operations (called hints) that are queued for that node while it was unresponsive. The database does not replay hints older than gc_grace_seconds after creation. The max_hint_window_in_ms setting in the cassandra.yaml file sets the time limit (3 hours by default) for collecting hints for the unresponsive node.
- batch replays : Like hint queues, batch operations store database mutations that are replayed in sequence. As with hints, the database does not replay a batched mutation older than gc_grace_seconds after creation. If your application uses batch operations, consider the possibility that decreasing gc_grace_seconds increases the chance that a batched write operation may restore deleted data. The batchlog_replay_throttle_in_kb property in the cassandra.yaml file give some control of the batch replay process. The most important factors, however, are the size and scope of the batches you use.
- memtable_flush_period_in_ms
- Milliseconds before
memtables
associated with the table are flushed.When memtable_flush_period_in_ms=0, the memtable will flush when:- the flush threshold is met
- on shutdown
- on nodetool flush
- when commitlogs get full
Default:
0
- min_index_interval
- Minimum gap between index entries in the index summary. A lower min_index_interval means the index summary contains more entries from the index, which allows the database to search fewer index entries to execute a read. A larger index summary may also use more memory. The value for min_index_interval is the densest possible sampling of the index.
- max_index_interval
- If the total memory usage of all index summaries reaches this value, Cassandra decreases the index summaries for the coldest SSTables to the maximum set by max_index_interval. The max_index_interval is the sparsest possible sampling in relation to memory pressure.
- read_repair_chance
- The probability that a successful read operation triggers a read repair. Unlike the
repair controlled by dclocal_read_repair_chance, this repair is not limited to replicas in the same DC as the coordinator. The value
must be between
0
and1
.Default value:
0.0
Note: In Cassandra,read_repair_chance
is deprecated and should be set to0.0
. - speculative_retry
-
Overrides normal read timeout when read_repair_chance is not 1.0, sending another request to read. Specify the value as a number followed by a type,
ms
(milliseconds) orpercentile
. For example,speculative_retry = '3ms'
.Configures rapid read protection. Normal read requests are sent to just enough replica nodes to satisfy the consistency level. In rapid read protection, extra read requests are sent to other replicas, even after the consistency level has been met. The speculative retry property specifies the trigger for these extra read requests.- ALWAYS: The coordinator node sends extra read requests to all other replicas after every read of that table.
- Xpercentile: Track each table's typical read latency (in
milliseconds). Coordinator node retrieves the typical latency time of the table
being read and calculates X percent of that figure. The coordinator sends
redundant read requests if the number of milliseconds it waits without responses
exceeds that calculated figure.
For example, if the speculative_retry property for Table_A is set to
80percentile
, and that table's typical latency is 60 milliseconds, the coordinator node handling a read of Table_A would send a normal read request first, and send out redundant read requests if it received no responses within 48ms, which is 80 % of 60ms. - Nms: The coordinator node sends extra read requests to all
other replicas if the coordinator node has not received any responses within
N
milliseconds. - NONE: The coordinator node does not send extra read requests after any read of that table.
compression = { compression_map }
Sets table compression.
compression_map
by specifying the compression algorithm
class
followed by the subproperties in simple JSON format. org.apache.cassandra.io.compress.ICompressor
interface.compression = {
['class' : 'compression_algorithm_name',
'chunk_length_in_kb' : 'value',
'crc_check_chance' : 'value',]
| 'sstable_compression' : '']
}
- class
-
Sets the compressor name, DataStax Distribution of Apache Cassandra™ (DDAC) provides the following built-in classes:
LZ4Compressor
SnappyCompressor
DeflateCompressor
Important: Use only compression implementations bundled with DDAC.Choosing the right compressor depends on your requirements for space savings over read performance. LZ4 is fastest to decompress, followed by Snappy, then by Deflate. Compression effectiveness is inversely correlated with decompression speed. The extra compression from Deflate or Snappy is not enough to make up for the decreased performance for general-purpose workloads, but for archival data they may be worth considering.
Default:
LZ4Compressor
. - chunk_length_in_kb
-
Size (in KB) of the block. On disk, SSTables are compressed by block to allow random reads. Values larger than the default value might improve the compression rate, but increases the minimum size of data to be read from disk when a read occurs. The default value is a good middle-ground for compressing tables. Adjust compression size to account for read/write access patterns (how much data is typically requested at once) and the average size of rows in the table.
Default:
64
. - crc_check_chance
-
When compression is enabled, each compressed block includes a checksum of that block for the purpose of detecting disk bitrot and avoiding the propagation of corruption to other replica. This option defines the probability with which those checksums are checked during read. By default they are always checked. Set to 0 to disable checksum checking and to 0.5, for instance, to check them on every other read.
Default:
1.0
. - sstable_compression
- Disables compression. Specify a null value.
compaction = {compaction_map}
Constructing a map of the compaction option and its subproperties.
Defines the strategy for cleaning up data after writes. Define a compaction class and
properties in simple JSON format: compaction = { 'class' :
'compaction_strategy_name' [, 'subproperty_name' :
'value',...] }
- compaction_strategy_name
- SizeTieredCompactionStrategy (STCS)
-
Triggers a minor compaction when table meets the
min_threshold
. Minor compactions do not involve all the tables in a keyspace. See SizeTieredCompactionStrategySizeTieredCompactionStrategy.Important: Use only compaction implementations bundled with DDAC.Default compaction strategy.
Table 2. subproperties Subproperty Default Description bucket_high 1.5
Size-tiered compaction merges sets of SSTables that are approximately the same size. The database compares each SSTable size to the average of all SSTable sizes for this table on the node. It merges SSTables whose size in KB are within [average-size × bucket_low] and [average-size × bucket_high]. bucket_low 0.5
See bucket_high. enabled true
Enables background compaction. log_all false
Activates advanced logging for the entire cluster. max_threshold 32
The maximum number of SSTables to allow in a minor compaction. min_threshold 4
The minimum number of SSTables to trigger a minor compaction. min_sstable_size 50 MB
STCS groups SSTables into buckets. The bucketing process groups SSTables that differ in size by less than 50%. This bucketing process is too fine grained for small SSTables. If your SSTables are small, use min_sstable_size to define a size threshold (in bytes) below which all SSTables belong to one unique bucket. only_purge_repaired_tombstones false
Setting to true allows purging tombstones only from repaired SSTables. This prevents data from resurrecting when repair is not run within gc_grace_seconds
. If you do not run repair for a long time, the database keeps all tombstones, which can cause problems.tombstone_compaction_interval 86400
Number of seconds before compaction can run on an SSTable after it is created. An SSTable is eligible for compaction when it exceeds the tombstone_threshold. Because it might not be possible to drop tombstones when doing a single SSTable compaction, and since the compaction is triggered base on an estimated tombstone ratio, this setting makes the minimum interval between two single SSTable compactions tunable to prevent an SSTable from being constantly re-compacted. tombstone_threshold 0.2
The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, compactions starts only on that table to purge the tombstones. unchecked_tombstone_compaction false
Setting to true allows tombstone compaction to run without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones. - DateTieredCompactionStrategy (deprecated)
- Use TimeWindowCompactionStrategy (TWCS) instead.
Stores data written within a certain period of time in the same SSTable.
Table 3. Subproperties Subproperty Default Description base_time_seconds 3600
The size of the first time window. enabled true
Enables background compaction. log_all false
Setting to true activates advanced logging for the entire cluster. max_sstable_age_days (deprecated) 1000
Cassandra does not compact SSTables if its most recent data is older than this property. Fractional days can be set. max_window_size_seconds 86400
The maximum window size in seconds. max_threshold 32
The maximum number of SSTables allowed in a minor compaction. min_threshold 4
The minimum number of SSTables that trigger a minor compaction. timestamp_resolution MICROSECONDS
Units, MICROSECONDS or MILLISECONDS, to match the timestamp of inserted data. tombstone_compaction_interval 86400
The minimum number of seconds after which an SSTable is created before the database considers the SSTable for tombstone compaction. An SSTable is eligible for tombstone compaction if the table exceeds the tombstone_threshold ratio. tombstone_threshold 0.2
The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, compactions starts only on that table to purge the tombstones. unchecked_tombstone_compaction false
Setting to true allows tombstone compaction to run without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones. - TimeWindowCompactionStrategy (TWCS)
-
Compacts SSTables using a series of time windows or buckets. TWCS creates a new time window within each successive time period. During the active time window, TWCS compacts all SSTables flushed from memory into larger SSTables using STCS. At the end of the time period, all of these SSTables are compacted into a single SSTable. Then the next time window starts and the process repeats. See TimeWindowCompactionStrategy.Note: All of the subproperties for STCS are also valid for TWCS.
Table 4. Subproperties Compaction Subproperties Default Description compaction_window_unit days
Time unit used to define the bucket size. The value is based on the Java TimeUnit
. For the list of valid values, see the Java APITimeUnit
page located at https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/TimeUnit.html.compaction_window_size 1 Units per bucket. log_all false
Setting this to True activates advanced logging for the entire cluster. - LeveledCompactionStrategy (LCS)
-
Creates SSTables of a fixed, relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous. Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are continuously being compacted into progressively larger levels. At each level, row keys are merged into non-overlapping SSTables in the next level. See LeveledCompactionStrategy (LCS).
Table 5. Subproperties Subproperties Default Description enabled true
Enables background compaction. log_all false
Setting to true activates advanced logging for the entire cluster. sstable_size_in_mb 160MB
The target size for SSTables that use the Leveled Compaction Strategy. Although SSTable sizes should be less or equal to sstable_size_in_mb, it is possible that compaction may produce a larger SSTable during compaction. This occurs when data for a given partition key is exceptionally large. The Cassandra database does not split the data into two SSTables. tombstone_compaction_interval 864000
(one day)The minimum number of seconds after an SSTable is created before SSTable tombstone compaction. Tombstone compaction begins when the SSTable's tombstone_threshold exceeds the set value. tombstone_threshold 0.2
The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, compaction starts only on that table to purge the tombstones. unchecked_tombstone_compaction false
Setting to true allows tombstone compaction to run without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones.
Table keywords
- CLUSTERING ORDER BY ( column_name ASC | DESC)
-
Order rows storage to make use of the on-disk sorting of columns. Specifying order can make query results more efficient. Options are:
ASC
: ascending (default order)DESC
: descending, reverse order - COMPACT STORAGE
-
Use
COMPACT STORAGE
to store data in the legacy (Thrift) storage engine format to conserve disk space.Important: For DataStax Distribution of Apache Cassandra™ (DDAC), the storage engine is much more efficient at storing data, and compact storage is not necessary. - ID
-
If a table is accidentally dropped with DROP TABLE, use this option to recreate the table and run a commitlog replayer to retrieve the data.
Examples
Create a table that has a frozen user-defined type.
CREATE TABLE cycling.race_winners ( race_name text, race_position int, cyclist_name FROZEN<fullname>, PRIMARY KEY (race_name, race_position) );
See "Creating a user-defined type" for information on creating UDTs. In DataStax Distribution of Apache Cassandra™ (DDAC), UDTs can be created unfrozen if only non-collection fields are used in the user-defined type creation. If the table is created with an unfrozen UDT, then individual field values can be updated and deleted.
CREATE TABLE cycling.cyclist_name ( id UUID PRIMARY KEY, lastname text, firstname text );
Creating a compound primary key
CREATE TABLE cycling.cyclist_category ( category text, points int, id UUID, lastname text, PRIMARY KEY (category, points)) WITH CLUSTERING ORDER BY (points DESC);
Creating a composite partition key
CREATE TABLE cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );
Setting caching
The database caches only the first N rows in a partition, as determined by the clustering order.
ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH caching = { 'keys' : 'ALL', 'rows_per_partition' : 'ALL' };
Creating a table with a CDC log:
CREATE TABLE cycling.cyclist_name WITH cdc = TRUE;
Adding a comment
ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH comment = "Basetable: cyclist_mv";
Change the speculative retries
ALTER TABLE users WITH speculative_retry = '10ms';
ALTER TABLE users WITH speculative_retry = '99percentile';
Enabling and disabling background compaction
ALTER TABLE mytable WITH COMPACTION = { 'class': 'SizeTieredCompactionStrategy', 'enabled': 'false' }
Disabling background compaction can be harmful: without it, the database does not regain disk space, and may allow zombies to propagate. Although compaction uses I/O, it is better to leave it enabled in most cases.
Reading extended compaction logs
Set the
log_all
subproperty to true
to collect in-depth
information about compaction activity on a node in a dedicated log file.
When extended compaction is enabled,
the database creates a file named compaction-%d.log (where
%d
is a sequential number) in home/logs.
-
type:enable
Lists SSTables that have been flushed previously
{"type":"enable","keyspace":"test","table":"t","time":1470071098866,"strategies": [ {"strategyId":"0","type":"LeveledCompactionStrategy","tables":[],"repaired":true,"folders": ["/home/carl/oss/cassandra/bin/../data/data"]}, {"strategyId":"1","type":"LeveledCompactionStrategy","tables":[],"repaired":false,"folders": ["/home/carl/oss/cassandra/bin/../data/data"] } ] }
type: flush
Logs a flush event from a memtable to an SSTable on disk, including the CompactionStrategy for each table.
{"type":"flush","keyspace":"test","table":"t","time":1470083335639,"tables": [ {"strategyId":"1","table": {"generation":1,"version":"mb","size":106846362,"details": {"level":0,"min_token":"-9221834874718566760","max_token":"9221396997139245178"} } } ] }
type: compaction
Logs a compaction event.
{"type":"compaction","keyspace":"test","table":"t","time":1470083660267,"start":"1470083660188","end":"1470083660267","input": [ {"strategyId":"1","table": {"generation":1372,"version":"mb","size":1064979,"details": {"level":1,"min_token":"7199305267944662291","max_token":"7323434447996777057"} } } ],"output": [ {"strategyId":"1","table": {"generation":1404,"version":"mb","size":1064306,"details": {"level":2,"min_token":"7199305267944662291","max_token":"7323434447996777057"} } } ] }
type: pending
Lists the number of pending tasks for a compaction strategy
{"type":"pending","keyspace":"test","table":"t","time":1470083447967,"strategyId":"1","pending":200}
Storing data in descending order
CREATE TABLE cycling.cyclist_category ( category text, points int, id UUID, lastname text, PRIMARY KEY (category, points)) WITH CLUSTERING ORDER BY (points DESC);
Using compaction storage
CREATE TABLE cycling.cyclist_category (
category text,
points int,
id UUID,
lastname text,
PRIMARY KEY (category, points))
WITH CLUSTERING ORDER BY (points DESC)
AND COMPACT STORAGE;
Restoring from commit log replayer
CREATE TABLE cycling.cyclist_emails (
userid text PRIMARY KEY,
emails set<text>
) WITH ID='5a1c395e-b41f-11e5-9f22-ba0be0483c18';
id
column of
system_schema.tables
. For
example:SELECT id
FROM system_schema.tables
WHERE keyspace_name = 'cycling'
AND table_name = 'cyclist_emails';
To perform a point-in-time restoration of the table, see Restoring a backup to a specific point-in-time.