Create a table column
Columns are defined at the time of table creation or during table modification.
To create or alter a table, a keyspace must exist.
All data types found in CQL can be used when creating a table column.
Basic types
For basic text and number types, see String types and Numeric types.
BLOBs
A blob (Binary Large OBject) data type represents a constant hexadecimal number defined as 0[xX](hex)+
where hex is a hexadecimal character, such as [0-9a-fA-F]
.
For example, 0xcafe
.
The maximum theoretical size for a blob is 2 GB. The practical limit on blob size, however, is less than 1 MB.
A blob type is suitable for storing a small image or short string.
These functions convert a native type into binary data (blob), and convert a blob back into a native type:
-
typeAsBlob(value)
-
blobAsType(value)
For every native non-blob data type supported by CQL, the typeAsBlob
function takes an argument of that data type and returns it as a blob.
Conversely, the blobAsType
function takes a 64-bit blob argument and converts it to a value of the specified data type, if possible.
The following example shows the use of a blob column and the bigintAsBlob
function:
-
Create a table with a blob column:
CREATE TABLE IF NOT EXISTS cycling.lastname_bio ( lastname varchar PRIMARY KEY, bio blob );
-
Insert data into the table:
INSERT INTO cycling.lastname_bio ( lastname, bio ) VALUES ( 'TSATEVICH', bigintAsBlob(3) );
-
Query the table:
SELECT * FROM cycling.lastname_bio;
Result
lastname | bio -----------+-------------------- TSATEVICH | 0x0000000000000003 (1 rows)
The following example shows the use of blobAsBigint
function:
-
Alter the table to add a
bigint
column:ALTER TABLE cycling.lastname_bio ADD id bigint;
-
Insert additional data into the table:
INSERT INTO cycling.lastname_bio ( lastname, id ) VALUES ( 'DUVAL', blobAsBigint(0x0000000000000003) );
-
Query the table:
SELECT * FROM cycling.lastname_bio;
Result
SELECT * FROM cycling.lastname_bio;
Collections (maps, lists, sets)
Collections are useful for storing a grouping of data within a single column, such as addresses as a collection of street, city, state, and zip code.
Sets, lists, and maps are the defined collection data types.
Counters
A counter column is a column that can contain integer values, and whose primitive operations are increment and decrement.
A counter type is exclusively used for columns that store numbers that are updated by increments or decrements. For example, you could use a counter to track of the number of web page views received on a company website or track of the number of games played online or the number of players who have joined an online game.
The following requirements and limitations apply:
-
You cannot set the value of a counter, you can only either increment or decrement it.
-
A table that contains a counter column must only have a primary key and one or more counter columns.
-
A counter column cannot be part of the primary key or partition key.
-
Many counter-related settings can be set in the cassandra.yaml file.
-
You cannot create an index on a counter column.
-
If you drop a counter column from a table, you cannot re-add it to the same table.
-
You cannot set a counter column’s value to expire using the Time-To-Live (TTL) or
USING TIMESTAMP
properties.
To implement a counter column, create a table that only includes the primary key (one or more columns) and one or more counter columns. For example:
CREATE TABLE IF NOT EXISTS cycling.popular_count (
id UUID PRIMARY KEY,
popularity counter
);
CREATE TABLE cycling.popular_count (
id uuid PRIMARY KEY,
popularity counter
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Loading data into a counter column is different than other tables.
The data is updated rather than inserted.
To load data into a counter column, or to increase or decrease the value of the counter, use the UPDATE
command.
The following example uses a BATCH
statement to increment the value of the popularity
column by 1, then 125, and then decrements by 64:
BEGIN COUNTER BATCH
UPDATE cycling.popular_count
SET popularity = popularity + 1
WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
UPDATE cycling.popular_count
SET popularity = popularity + 125
WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
UPDATE cycling.popular_count
SET popularity = popularity - 64
WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
APPLY BATCH;
The following example increments the value of the popularity
column by 2.
Note the use of the WHERE
clause to specify the row to update.
UPDATE cycling.popular_count SET popularity = popularity + 2
WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
After all these updates, the popularity
counter column has a value of 64:
SELECT * FROM cycling.popular_count;
id | popularity
--------------------------------------+------------
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 64
(1 rows)
The operations for a counter column are straightforward: Additional increments or decrements change the value of the counter column.
Dates and times
Date and time data types are commonly used for storing time series data or other timestamps.
Geospatial
Geospatial data types are useful for storing information that is geometric in nature, such as mapping information.
Static
Static columns allow you to insert a single value that can be included in multiple rows.
In a table that uses clustering columns, non-clustering columns can be declared static in the table definition.
A static column’s value is shared by all the rows in the table in a particular partition, and is static only within that partition.
The table must define clustering columns in order to define a static column, and only non-clustering columns can be declared static in the table definition. A static column cannot be part of the partition key.
-
Create a table
cycling.country_flag
with a primary key consisting of the columnscountry
andcyclist_name
. Theflag
column is defined as aSTATIC
column.CREATE TABLE IF NOT EXISTS cycling.country_flag ( country text, cyclist_name text, flag int STATIC, PRIMARY KEY (country, cyclist_name) );
-
Insert some data into the table.
INSERT INTO cycling.country_flag ( country, cyclist_name, flag ) VALUES ( 'Belgium', 'Jacques', 1 ); INSERT INTO cycling.country_flag ( country, cyclist_name ) VALUES ( 'Belgium', 'Andre' ); INSERT INTO cycling.country_flag ( country, cyclist_name, flag ) VALUES ( 'France', 'Andre', 2 ); INSERT INTO cycling.country_flag ( country, cyclist_name, flag ) VALUES ( 'France', 'George', 3 );
Note that the
flag
column is entered with the same value for each row in the partition. -
Query the table to see the data.
SELECT * FROM cycling.country_flag;
Resultcountry | cyclist_name | flag ---------+--------------+------ Belgium | Andre | 1 Belgium | Jacques | 1 France | Andre | 3 France | George | 3 (4 rows)
A batch update can be used to update the static column value for all rows in the partition.
For tables that use static columns, the unique partition key identifiers for rows can be retrieved using the DISTINCT
keyword.
Use the DISTINCT
keyword to select static columns.
In this case, the database retrieves only the beginning (static column) of the partition.
SELECT DISTINCT country FROM cycling.country_flag;
country
---------
Belgium
France
(2 rows)
Tuples
Tuples are similar to collections as a data type that allows grouping data within a single column, but are more open-ended in format.
Tuples are a data type that allow two or more values with mixed data type to be stored together in a column.
A user-defined type is another choice, but for simple groupings, a tuple
is a good choice.
A tuple
can accommodate many fields (32768), but should generally be created to hold 2 to 5 fields.
Example: Create a table with a tuple column and index the tuple
Create a table cycling.nation_rank
with a primary key of nation
and using a tuple
to store the rank, cyclist name, and points total for a cyclist.
In the table creation statement, use angle brackets and a comma delimiter to declare the tuple
component types.
CREATE TABLE IF NOT EXISTS cycling.nation_rank (
nation text PRIMARY KEY,
info tuple<int, text, int>
);
The tuple
column can be indexed:
CREATE CUSTOM INDEX info_idx ON cycling.nation_rank (info) USING 'StorageAttachedIndex';
A query can be run to retrieve all the data from the table:
SELECT * FROM cycling.nation_rank;
nation | info
---------+---------------------------------
Spain | (1, 'Alejandro VALVERDE', 9054)
Belgium | (3, 'Phillippe GILBERT', 6222)
France | (2, 'Sylvain CHAVANEL', 6339)
Italy | (4, 'Davide REBELLINI', 6090)
(4 rows)
If a tuple
column is indexed, the data in the table can be filtered using the indexed column:
SELECT * FROM cycling.nation_rank
WHERE info = (3, 'Phillippe GILBERT', 6222);
Example: Create another table with the same data, but with the rank as the primary key
The previous table cycling.nation_rank
uses the nation as the primary key.
It is possible to store the same data using the rank as the primary key, as shown in the following example.
The example creates a table named cycling.popular
using a tuple
to store the country name, cyclist name, and points total for a cyclist, with rank set as the primary key.
CREATE TABLE IF NOT EXISTS cycling.popular (
rank int PRIMARY KEY,
cinfo tuple<text, text, int>
);
Example: Create a table using nested tuples for geographic data
Another example creates a table cycling.route
using a tuple
to store each waypoint location name, latitude, and longitude.
Two tuples are used in the following example, with one tuple nested inside the other tuple.
CREATE TABLE IF NOT EXISTS cycling.route (
race_id int,
race_name text,
point_id int,
lat_long tuple<text, tuple<float, float>>,
PRIMARY KEY (race_id, point_id)
);
Insert some data:
INSERT INTO cycling.route (
race_id, race_name, point_id, lat_long
) VALUES (
500, '47th Tour du Pays de Vaud', 1, ('Onnens', (46.8444, 6.6667))
);
INSERT INTO cycling.route (
race_id, race_name, point_id, lat_long
) VALUES (
500, '47th Tour du Pays de Vaud', 2, ('Champagne', (46.833, 6.65))
);
INSERT INTO cycling.route (
race_id, race_name, point_id, lat_long
) VALUES (
500, '47th Tour du Pays de Vaud', 3, ('Novalle', (46.833, 6.6))
);
INSERT INTO cycling.route (
race_id, race_name, point_id, lat_long
) VALUES (
500, '47th Tour du Pays de Vaud', 4, ('Vuiteboeuf', (46.8, 6.55))
);
INSERT INTO cycling.route (
race_id, race_name, point_id, lat_long
) VALUES (
500, '47th Tour du Pays de Vaud', 5, ('Baulmes', (46.7833, 6.5333))
);
INSERT INTO cycling.route (
race_id, race_name, point_id, lat_long
) VALUES (
500, '47th Tour du Pays de Vaud', 6, ('Les Clées', (46.7222, 6.5222))
);
Query the data to retrieve all the data from the table:
SELECT * FROM cycling.route;
race_id | point_id | lat_long | race_name
---------+----------+----------------------------------+---------------------------
500 | 1 | ('Onnens', (46.8444, 6.6667)) | 47th Tour du Pays de Vaud
500 | 2 | ('Champagne', (46.833, 6.65)) | 47th Tour du Pays de Vaud
500 | 3 | ('Novalle', (46.833, 6.6)) | 47th Tour du Pays de Vaud
500 | 4 | ('Vuiteboeuf', (46.8, 6.55)) | 47th Tour du Pays de Vaud
500 | 5 | ('Baulmes', (46.7833, 6.5333)) | 47th Tour du Pays de Vaud
500 | 6 | ('Les Clées', (46.7222, 6.5222)) | 47th Tour du Pays de Vaud
(6 rows)
UDTs
User-defined types (UDTs) are the ultimate in free-form data types, allowing a user to compile an unique grouping of values that will provide specific results upon querying.
UUIDs
UUIDs are a common data type for uniquely identifying a column value that avoids collision with other values.
See UUID and timeuuid.
Vectors
Use the vector data type if vector search will be performed on a column in the table.
For example, create a table named cycling.comments_vs
with a vector
column to store the embeddings for vector search:
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
Other types
For more types, such as Boolean and inet, see CQL data types.