CREATE TABLE
Creates a new table in the selected keyspace.
Use IF NOT EXISTS
to suppress the error message if the table already exists; no table is created.
Syntax
create_table_statement::= CREATE TABLE [ IF NOT EXISTS ] table_name '('
column_definition ( ',' column_definition )*
[ ',' PRIMARY KEY '(' primary_key ')' ]
')' [ WITH table_options ]
column_definition::= column_name cql_type [ STATIC ] [ column_mask ] [ PRIMARY KEY]
column_mask::= MASKED WITH ( DEFAULT | function_name '(' term ( ',' term )* ')' )
primary_key::= partition_key [ ',' clustering_columns ]
partition_key::= column_name | '(' column_name ( ',' column_name )* ')'
clustering_columns::= column_name ( ',' column_name )*
table_options::= COMPACT STORAGE [ AND table_options ]
| CLUSTERING ORDER BY '(' clustering_order ')'
[ AND table_options ] | options
clustering_order::= column_name (ASC | DESC) ( ',' column_name (ASC | DESC)
CREATE TABLE [ IF NOT EXISTS ] [<keyspace_name>.]<table_name> ( <column_definition> [ , ... ] | [ PRIMARY KEY (partition_key_column_name, clustering_column_name) ] ) [ WITH COMPACT STORAGE ] (1) [ AND <table_options> ] [ CLUSTERING ORDER BY [ <clustering_column_name> [ ASC | DESC ] ][ , ... ] ] (2) ifdef::dse68[] [ AND ( VERTEX LABEL <vl_name> | EDGE LABEL ) <el_name> FROM <vl_name> TO <vl_name>] endif::dse68[] [ [ AND ] ID = '<table_hash_tag>' ] ] ; (3)
1 | ONLY SUPPORTED FOR DSE 5.1 OR EARLER. Use only to migrate to a later DSE version. |
2 | The CLUSTERING ORDER BY clause is optional.
If not specified, the default is ASC . |
3 | The ID clause is optional.
If specified, the table is created with a hash tag that is used to identify the table in the schema.
If a table is accidentally dropped, use this option to recreate the table and run a commit log replay to retrieve the data. |
Syntax legend
Syntax conventions | Description |
---|---|
UPPERCASE |
Literal keyword. |
Lowercase |
Not literal. |
|
Variable value. Replace with a user-defined value. |
|
Optional.
Square brackets ( |
|
Group.
Parentheses ( |
|
Or.
A vertical bar ( |
|
Repeatable.
An ellipsis ( |
|
Single quotation ( |
|
Map collection.
Braces ( |
Set, list, map, or tuple.
Angle brackets ( |
|
|
End CQL statement.
A semicolon ( |
|
Separate the command line options from the command arguments with two hyphens ( |
|
Search CQL only: Single quotation marks ( |
|
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
Parameters
Parameter | Description | Default | ||
---|---|---|---|---|
|
Optional. Name of the keyspace that contains the table.
|
If no name is specified, the current keyspace is used. |
||
|
Name of the table. |
|||
|
List of |
|||
|
Name of the column. |
|||
|
CQL data type. See CQL data types. |
|||
|
List of columns that make up the primary key. The first column is the partition key. If there is more than one column in the partition key, enclose the columns in parentheses. Additional columns are clustering columns. |
|||
|
Name of the partition key column. One or more partition key columns are required. |
|||
|
Name of the clustering column.
Optional. One or more clustering columns can be defined.
Also define |
|
Options
Table options are specified using the WITH
keyword.
Tunes data handling, including I/O operations, compression, and compaction.
A single value can be specified for each options.
Some options have sub-properties that are specified using a simple JSON format, with key-value pairs in a comma-separated list enclosed by curly braces.
Option | Description | Default | ||
---|---|---|---|---|
|
Same as |
|
||
|
Create snapshots automatically before keyspace truncation or dropping a table. |
|
||
|
False-positive probability for SSTable bloom filter.
When a client requests data, the bloom filter checks if the row exists before executing disk I/O.
Values range from 0 to 1.0, where:
|
|
||
|
Optimizes the use of cache memory without manual tuning. Weighs the cached data by size and access frequency. Coordinate this setting with the global caching properties in the cassandra.yaml file. Valid values:
|
|
||
|
Creates a Change Data Capture (CDC) log on the table. Valid values:
|
|
||
|
Provide comments that pertain to the table. |
|
||
|
Configures the compaction strategy. See Compaction sub-properties for more information. |
|
||
|
Configures the compression algorithm. See Compression sub-properties for more information. |
|
||
|
Probability that a read request will check the CRC of a row. |
|
||
|
TTL (Time To Live) in seconds, where zero is disabled.
The maximum configurable value is
If the value is greater than zero, TTL is enabled for the entire table and an expiration timestamp is added to each column. A new TTL timestamp is calculated each time the data is updated and the row is removed after all the data expires. |
|
||
|
Specifies the table extensions. |
|
||
|
Seconds after data is marked with a tombstone (deletion marker) before it is eligible for garbage-collection.
In a single-node cluster, this property can safely be set to zero.
You can also reduce this value for tables whose data is not explicitly deleted; for example, tables containing only data with TTL set, or tables with |
|
||
|
Enable incremental backups for the table. |
|
||
|
If the total memory usage of all index summaries reaches this value, CQL decreases the index summaries for the coldest SSTables to the maximum set by |
|
||
|
|
|||
|
Milliseconds before
|
|
||
|
Minimum gap between index entries in the index summary.
A lower |
|
||
|
Sets read repair behavior.
Choices are |
|
||
|
Configures rapid read protection. Normal read requests are sent to just enough replica nodes to satisfy the consistency level. In rapid read protection, extra read requests are sent to other replicas, even after the consistency level has been met. The speculative retry property specifies the trigger for these extra read requests.
For example, if the speculative_retry property for Table_A is set to
|
|
Compression sub-properties
Configure by specifying the compression algorithm class
followed by the sub-properties in simple JSON format.
Custom compression classes can be implemented using the org.apache.cassandra.io.compress.ICompressor
interface.
Use only compression class implementations bundled with DSE.
compression = {
['class' : '<compression_algorithm_name>',
'chunk_length_in_kb' : '<value>',
'crc_check_chance' : '<value>',]
| 'sstable_compression' : '']
}
Property |
Description |
Default |
|
Sets the compressor name. CQL provides the following built-in classes:
Choosing the right compressor depends on your requirements for space savings over read performance. LZ4 is fastest to decompress, followed by Snappy, then by Deflate. Compression effectiveness is inversely correlated with decompression speed. The extra compression from Deflate or Snappy is not enough to make up for the decreased performance for general-purpose workloads, but for archival data they may be worth considering. |
|
|
Size (in KB) of the block. On disk, SSTables are compressed by block to allow random reads. Values larger than the default value might improve the compression rate, but increases the minimum size of data to be read from disk when a read occurs. The default value is a good middle ground for compressing tables. Adjust compression size to account for read/write access patterns (how much data is typically requested at once) and the average size of rows in the table. |
|
|
When compression is enabled, each compressed block includes a checksum of that block for the purpose of detecting disk bit rot and avoiding the propagation of corruption to other replica. This option defines the probability with which those checksums are checked during read. By default they are always checked. Set to 0 to disable checksum checking and to 0.5, for instance, to check them on every other read. |
|
|
Disables compression. |
Specify a null value. |
Compaction sub-properties
Configure by specifying the compaction algorithm class
followed by the sub-properties in simple JSON format.
Use only compaction implementations bundled with CQL.
See How is data maintained? for more details.
compaction = {
'class' : '<compaction_strategy_name>',
'<property_name>' : <value> [, ...] }
where the <compaction_strategy_name> is SizeTieredCompactionStrategy, TimeWindowCompactionStrategy, or LeveledCompactionStrategy.
Common properties
The following properties apply to all compaction strategies.
compaction = {
'class' : 'compaction_strategy_name',
'enabled' : (true | false),
'log_all' : (true | false),
'only_purge_repaired_tombstone' : (true | false),
'tombstone_threshold' : <ratio>,
'tombstone_compaction_interval' : <sec>,
'unchecked_tombstone_compaction' : (true | false),
'min_threshold' : <num_sstables>,
'max_threshold' : <num_sstables> }
Property | Description | Default |
---|---|---|
|
Enable background compaction.
Use |
|
|
Activates advanced logging for the entire cluster. |
|
|
Enabling this property prevents data from resurrecting when repair is not run within the
|
|
|
The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, compactions starts only on that table to purge the tombstones. |
|
|
Number of seconds before compaction can run on an SSTable after it is created.
An SSTable is eligible for compaction when it exceeds the |
|
|
Setting to |
|
|
The minimum number of SSTables to trigger a minor compaction.
Restriction: Not used in |
|
|
The maximum number of SSTables before a minor compaction is triggered.
Restriction: Not used in |
|
SizeTieredCompactionStrategy
The compaction class SizeTieredCompactionStrategy
(STCS) is the default compaction strategy.
The compaction class SizeTieredCompactionStrategy
(STCS) triggers a minor compaction when table meets the min_threshold
.
Minor compactions do not involve all the tables in a keyspace.
See SizeTieredCompactionStrategy (STCS).
The following properties only apply to SizeTieredCompactionStrategy:
compaction = {
'class' : 'SizeTieredCompactionStrategy',
'bucket_high' : <factor>,
'bucket_low' : <factor>,
'min_sstable_size' : <int> }
Property | Description | Default |
---|---|---|
|
Size-tiered compaction merges sets of SSTables that are approximately the same size. The database compares each SSTable size to the average of all SSTable sizes for this table on the node. It merges SSTables whose size in KB are within [average-size * bucket_low] and [average-size * bucket_high]. |
|
|
Size-tiered compaction merges sets of SSTables that are approximately the same size. The database compares each SSTable size to the average of all SSTable sizes for this table on the node. It merges SSTables whose size in KB are within [average-size * bucket_low] and [average-size * bucket_high]. |
|
|
STCS groups SSTables into buckets. The bucketing process groups SSTables that differ in size by less than 50%. This bucketing process is too fine-grained for small SSTables. If your SSTables are small, use this option to define a size threshold in MB below which all SSTables belong to one unique bucket. |
|
The |
TimeWindowCompactionStrategy
The compaction class TimeWindowCompactionStrategy
(TWCS) compacts SSTables using a series of time windows or buckets.
TWCS creates a new time window within each successive time period.
During the active time window, TWCS compacts all SSTables flushed from memory into larger SSTables using STCS.
At the end of the time period, all of these SSTables are compacted into a single SSTable.
Then the next time window starts and the process repeats.
See TimeWindowCompactionStrategy (TWCS).
All of the properties for STCS are also valid for TWCS. |
The following properties apply only to TimeWindowCompactionStrategy:
compaction = {
'class' : 'TimeWindowCompactionStrategy,
'compaction_window_unit' : <days>,
'compaction_window_size' : <int>,
'split_during_flush' : (true | false) }
Property | Description | Default |
---|---|---|
|
Time unit used to define the bucket size.
The value is based on the Java |
|
|
Units per bucket. |
|
|
Prevents mixing older data from repairs and hints with newer data from the current time window. During a flush operation, determines whether data partitions are split based on the configured time window.
|
|
During the flush operation, the data is split into a maximum of 12 windows. Each window holds the data in a separate SSTable. If the current time is <t0> and each window has a time duration of <w>, the data is split in the SSTables as follows:
-
SSTable 0 contains data for the time period < <t0> - 10 * <w>
-
SSTables 1 to 10 contain data for the 10 equal time periods from (<t0> - 10 * <w>) through to (<t0> - 1 * <w>)
-
SSTable 11, the 12th table, contains data for the time period > <t0>
LeveledCompactionStrategy
The compaction class LeveledCompactionStrategy
(LCS) creates SSTables of a fixed, relatively small size (160 MB by default) that are grouped into levels.
Within each level, SSTables are guaranteed to be non-overlapping.
Each level (L0, L1, L2 and so on) is 10 times as large as the previous.
Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are continuously being compacted into progressively larger levels.
At each level, row keys are merged into non-overlapping SSTables in the next level.
See LeveledCompactionStrategy (LCS).
For more guidance, see When to Use Leveled Compaction and Leveled Compaction blog. |
The following properties only apply to LeveledCompactionStrategy:
compaction = {
'class' : 'LeveledCompactionStrategy,
'sstable_size_in_mb' : <int> }
Property | Description | Default |
---|---|---|
|
The target size for SSTables that use the LeveledCompactionStrategy. Although SSTable sizes should be less or equal to sstable_size_in_mb, it is possible that compaction could produce a larger SSTable during compaction. This occurs when data for a given partition key is exceptionally large. The DSE database does not split the data into two SSTables. |
|
The default value, 160 MB, may be inefficient and negatively impact database indexing and the queries that rely on indexes. For example, consider the benefit of using higher values for sstable_size_in_mb in tables that use (SAI) indexes. For related information, see Compaction strategies. |
DateTieredCompactionStrategy (deprecated)
Use TimeWindowCompactionStrategy instead.
Stores data written within a certain period of time in the same SSTable.
Property | Description | Default |
---|---|---|
|
The size of the first time window. |
|
|
DSE does not compact SSTables if its most recent data is older than this property. Fractional days can be set. |
|
|
The maximum window size in seconds. |
|
|
Units, <MICROSECONDS> or <MILLISECONDS>, to match the timestamp of inserted data. |
|
Usage Notes
All tables must have at least one column that is part of the primary key. A primary key is made up of a partition key and zero or more clustering columns. Partition keys can be composite, i.e., made up of multiple columns. A static column is a column that is shared by all rows in a partition, and is stored only once per partition. A static column cannot be part of the primary key. Primary keys can include frozen collections and UDTs.
GC Grace Period Considerations
If you lower the
gc_grace_seconds
value, consider its interaction with these operations:
-
hint replays: When a node goes down and then comes back up, other nodes replay the write operations (called hints) that are queued for that node while it was unresponsive. The database does not replay hints older than gc_grace_seconds after creation. The max_hint_window_in_ms setting in the cassandra.yaml file sets the time limit (3 hours by default) for collecting hints for the unresponsive node.
-
batch replays: Like hint queues, batch operations store database mutations that are replayed in sequence. As with hints, the database does not replay a batched mutation older than gc_grace_seconds after creation. If your application uses batch operations, consider the possibility that decreasing gc_grace_seconds increases the chance that a batched write operation may restore deleted data. The batchlog_replay_throttle_in_kb property in the cassandra.yaml file give some control of the batch replay process. The most important factors, however, are the size and scope of the batches you use.
Examples
Detailed examples can be found for each type of indexing:
Type | Example links |
---|---|
Single partition key in current keyspace |
|
Single partition key in another keyspace |
|
Primary key specified separately |
|
Mixed-case table name |
|
Multi-column (composite) partition key |
|
Clustering columns |
|
Frozen UDTs |
|
Geo-spatial data |
|
With CDC log |
|
Sort in descending order |
|
Restore table using |