Collections in CQL

Collection data types are a way to group and store data together in a column. The following collection data types are available in CQL:

set: Store unordered items
list: Store ordered items
map: Store key-value pairs

Collections are indexable for more versatile querying.

When to use collections

Collections are useful when the data is small and the data is always accessed together. For example, collections are ideal for storing a user’s email addresses, phone numbers, or a cyclist’s teams and events.

CQL avoids joins between two tables by storing the grouping of items in a collection column in the user table. In a relational database, this grouping would be stored in separate tables and joined between tables with one or more foreign keys.

If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns. Collections are intended for insertion and retrieval as a collection. They are not intended for querying individual elements within the collection.

Which collection type to use

CQL reads collections in their entirety, so retrieval performance can be affected. In general, collections should be smaller than the following maximum size to prevent querying delays.

Guard rails for non-frozen collections:

No more than 2 billion items in a collection.
Maximum size of an item in a set is 65,535 bytes.
Maximum size of an item in a list or map is 2 GB.
Maximum number of keys in a map is 65,535.

Guard rails for frozen collections:

The maximum size of a frozen collection is 2 GB, as with the BLOB data type. In general, frozen collections should be smaller than 1 MB to prevent querying delays.

When choosing a collection type, consider the following:

A good rule of thumb is that sets are more performant than lists, so use a set if you can. Use a list when the order of elements matter or when you need to store the same value multiple times. Use a map when you need to store key-value pairs.

Frozen vs non-frozen collections

Frozen collections, are more efficient than non-frozen collections, but they can only be updated as a whole. When filtering a SELECT query, the entire frozen collection is used and read, even if only one element is needed. Another benefit of frozen collections is that they can be used as part of the primary key, which is not possible with non-frozen collections.

Non-frozen collections are more flexible, and can update a single value, but they are slower to read and write. When filtering a SELECT query, only the elements that match the filter are executed and read, not a full collection. Thus, non-frozen collections can be filtered with a WHERE clause that uses CONTAINS to match a value, CONTAINS KEY to match a key, or for maps, map[key] = value. Non-frozen collections can also be nested, if the nested collection is frozen.

If you will not need to update the collection, use frozen collections to improve performance. If you need to update the collection, use non-frozen collections.

Create a collection

To create a collection, you must define the collection type, column name, and column data type when you create or alter a table.

Collections are not paged internally. Reading a range of values is not supported for collections.
Make sure you understand the use and importance of the primary key in table schema.

Create set collections

A set consists of a unordered group of elements with unique values. Duplicate values will not be stored distinctly. The values of a set are stored unordered, but will return the elements in sorted order when queried. Use the set data type to store data that has a many-to-one relationship with another column.

Non-frozen set

To create a non-frozen set collection in a table, specify the collection column name and column data type enclosed in angle brackets:

CREATE TABLE IF NOT EXISTS cycling.cyclist_career_teams (
  id UUID PRIMARY KEY,
  lastname text,
  teams set<text>
);

The set collection is created with the column name teams, and the data type text:

Result

CREATE TABLE cycling.cyclist_career_teams (
    id uuid PRIMARY KEY,
    lastname text,
    teams set<text>
) WITH additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND nodesync = {'enabled': 'true', 'incremental': 'true'}
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Frozen set

To create a frozen set collection in a table specify the collection column name and a column data type enclosed with angle brackets, with the FROZEN keyword:

CREATE TABLE IF NOT EXISTS cycling.race_results (
  race_name text PRIMARY KEY,
  race_history frozen<set<int>>,
  top_three frozen<map<int, text>>
);

The set collection is created with the column name race_history, and the data type int.

Result

CREATE TABLE cycling.race_results (
    race_name text PRIMARY KEY,
    race_history frozen<set<int>>,
    top_three frozen<map<int, text>>
) WITH additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND nodesync = {'enabled': 'true', 'incremental': 'true'}
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Create list collections

A list is similar to a set; it groups and stores values. Unlike a set, the values stored in a list do not need to be unique and can be duplicated. In addition, a list stores the elements in a particular order and may be inserted or retrieved according to an index value.

Use the list data type to store data that has a possible many-to-many relationship with another column.

Non-frozen list

To create a non-frozen list collection in a table, specify a collection column name, and a column data type enclosed in angle brackets:

CREATE TABLE IF NOT EXISTS cycling.upcoming_calendar (
  year int,
  month int,
  events list<text>,
  PRIMARY KEY (year, month)
);

The list collection is created with the column name events, and the data type text.

Result

CREATE TABLE cycling.upcoming_calendar (
    year int,
    month int,
    events list<text>,
    PRIMARY KEY (year, month)
) WITH CLUSTERING ORDER BY (month ASC)
    AND additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND nodesync = {'enabled': 'true', 'incremental': 'true'}
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Frozen list

To create a frozen list collection in a table, specify a collection column name and a column data type enclosed in angle brackets, with the FROZEN keyword:

CREATE TABLE IF NOT EXISTS cycling.race_starts (
  cyclist_name text PRIMARY KEY,
  rnumbers FROZEN<LIST<int>>
);

The list collection is created with the column name rnumbers, and the data type int.

Result

CREATE TABLE cycling.race_starts (
    cyclist_name text PRIMARY KEY,
    rnumbers frozen<list<int>>
) WITH additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND nodesync = {'enabled': 'true', 'incremental': 'true'}
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Create map collections

A map relates one item to another with a key-value pair. For each key, only one value may exist, and duplicates cannot be stored. Both the key and the value are designated with a data type.

For example, using the map type, you can store timestamp-related information in user profiles. Each element of a non-frozen map is internally stored as a single column that you can modify, replace, delete, and query. Each element can have an individual time-to-live and expire when the TTL ends.

Non-frozen map

To create a non-frozen map collection in a table, specify a collection column name, and a pair of column data types enclosed in angle brackets:

CREATE TABLE IF NOT EXISTS cycling.cyclist_teams (
  id uuid PRIMARY KEY,
  firstname text,
  lastname text,
  teams map<int, text>
);

The map collection is created with the column name teams, and the data types will have a key year of integer type, and a team name value of text type.

Result

CREATE TABLE cycling.cyclist_teams (
    id uuid PRIMARY KEY,
    firstname text,
    lastname text,
    teams map<int, text>
) WITH additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND nodesync = {'enabled': 'true', 'incremental': 'true'}
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Frozen map

To create a frozen map collection in a table, specify a collection column name, and a pair of column data types enclosed in angle brackets, with the FROZEN keyword:

CREATE TABLE IF NOT EXISTS cycling.race_results (
  race_name text PRIMARY KEY,
  race_history frozen<set<int>>,
  top_three frozen<map<int, text>>
);

The map collection is created with the column name top_three, and the data types will have a key rank of integer type, and a cyclist name value of text type.

Result

CREATE TABLE cycling.race_results (
    race_name text PRIMARY KEY,
    race_history frozen<set<int>>,
    top_three frozen<map<int, text>>
) WITH additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND nodesync = {'enabled': 'true', 'incremental': 'true'}
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Check a collection

To check if a collection column exists in a table, see Check column existence.

Alter a collection

To alter the definition of a collection column in a table, see Alter a column.

Drop a collection

To drop a collection column from a table, see Drop a column.

Insert or update collection data

To insert or edit the data in a collection column in a table, use an INSERT or UPDATE statement.

To write to a collection, you must have a table with a collection column.

Insert or update a set

Set values must be unique, because no order is defined in a set internally.

A frozen set column only allows you to insert or update the entire set. Elements cannot be prepended or appended to the set unless it is non-frozen.

Insert data into the set, enclosing values in curly braces:

INSERT INTO cycling.cyclist_career_teams (
  id, lastname, teams
 ) VALUES (
  5b6962dd-3f90-4c93-8f61-eabfa4a803e2,
  'VOS',
  {
    'Rabobank-Liv Woman Cycling Team',
    'Rabobank-Liv Giant',
    'Rabobank Women Team',
    'Nederland bloeit'
  }
);

Add an element to a set using an UPDATE statement and the addition (+) operator:

UPDATE cycling.cyclist_career_teams
  SET teams = teams + {'Team DSB - Ballast Nedam'}
  WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

Remove an element from a set using the subtraction (-) operator:

UPDATE cycling.cyclist_career_teams
  SET teams = teams - {'DSB Bank Nederland bloeit'}
  WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

Remove all elements from a set by using an UPDATE or DELETE statement

If you remove all elements from a set, the empty set is stored as a null set.

UPDATE cycling.cyclist_career_teams SET teams = {}
WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

DELETE teams FROM cycling.cyclist_career_teams
  WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

A query for the teams returns null:

SELECT id, lastname, teams FROM cycling.cyclist_career_teams
  WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

Result

 id                                   | lastname | teams
--------------------------------------+----------+-------
 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 |      VOS |  null

(1 rows)

Insert or update a list

A frozen list column only allows you to insert or update the entire list. Elements cannot be prepended or appended to the list unless it is non-frozen.

Insert data into the list, enclosing values in square brackets:

INSERT INTO cycling.upcoming_calendar ( year, month, events )
VALUES ( 2015, 06, [ 'Criterium du Dauphine', 'Tour de Suisse' ] );

INSERT INTO cycling.upcoming_calendar ( year, month, events )
VALUES ( 2015, 07, [ 'Tour de France' ] );

Use an UPDATE statement to insert values into the list:

Prepend an element to the list by enclosing it in square brackets and using the addition (+) operator:
```
UPDATE cycling.upcoming_calendar SET events = [ 'Tour de France' ] + events
  WHERE year = 2015 AND month = 06;
```
This update operation is implemented internally without any read-before-write. Prepending a new element to the list writes only the new element.
Append an element to the list by switching the order of the new element data and the list name in the UPDATE statement:
```
UPDATE cycling.upcoming_calendar SET events = events + [ 'Tour de France' ]
  WHERE year = 2017 AND month = 05;
```
This update operation is implemented internally without any read-before-write. Appending a new element to the list writes only the new element.
Add an element at a particular position using the list index position in square brackets:
```
UPDATE cycling.upcoming_calendar SET events[2] = 'Vuelta Ciclista a Venezuela'
  WHERE year = 2015 AND month = 06;
```
To add an element at a particular position, the database reads the entire list, and then rewrites the part of the list that needs to be shifted to the new index positions. Consequently, adding an element at a particular position results in greater latency than appending or prepending an element to a list.

Remove an element from a list, use a DELETE statement and the list index position in square brackets. For example, to remove the first event in the list:

DELETE events[0] FROM cycling.upcoming_calendar
  WHERE year = 2015 AND month = 07;

The method of removing elements using an indexed position from a list requires an internal read. In addition, the client-side application could only discover the indexed position by reading the whole list and finding the values to remove, adding additional latency to the operation. If another thread or client prepends elements to the list before the operation is done, incorrect data will be removed.

Remove all elements having a particular value using an UPDATE statement, the subtraction operator (-), and the list value in square brackets.
```
UPDATE cycling.upcoming_calendar SET events = events - ['Tour de France Stage 10']
  WHERE year = 2015 AND month = 07;
```
This is a safer and faster way to delete values from a list than by index position.

Insert or update a map

A frozen map column only allows you to insert or update the entire map. Elements cannot be prepended or appended to the map unless it is non-frozen.

Set or replace map data, using an INSERT or UPDATE statement. Enclose the integer and text values in a map collection with curly braces, separated by a colon.

INSERT INTO cycling.cyclist_teams (
  id, firstname, lastname, teams
) VALUES (
  5b6962dd-3f90-4c93-8f61-eabfa4a803e2,
  'Marianne',
  'VOS',
  {
    2015 : 'Rabobank-Liv Woman Cycling Team',
    2014 : 'Rabobank-Liv Woman Cycling Team'
  }
);

Using INSERT in this manner replaces the entire map.

Use an UPDATE statement to insert values into the map:

Append an element to the map by enclosing the key-value pair in curly braces and using the addition (+) operator:

UPDATE cycling.cyclist_teams
  SET teams = teams + { 2009 : 'DSB Bank - Nederland bloeit' }
  WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

Set a specific element using an UPDATE statement, enclosing the specific key of the element (an integer) in square brackets, and using the equals operator (=) to map the value assigned to the key:
```
UPDATE cycling.cyclist_teams
  SET teams[2006] = 'Team DSB - Ballast Nedam'
  WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;
```
Delete an element from the map using a DELETE statement and enclosing the specific key of the element in square brackets:
```
DELETE teams[2009] FROM cycling.cyclist_teams
  WHERE id=e7cd5752-bc0d-4157-a80f-7523add8dbcd;
```
Remove all elements having a particular value using an UPDATE statement, the subtraction operator (-), and the map key values in curly braces:
```
UPDATE cycling.cyclist_teams
  SET teams = teams - { 2013, 2014 }
  WHERE id = e7cd5752-bc0d-4157-a80f-7523add8dbcd;
```

Collections in CQL

When to use collections

Which collection type to use

Frozen vs non-frozen collections

Create a collection

Create set collections

Create list collections

Create map collections

Check a collection

Alter a collection

Drop a collection

Insert or update collection data

Insert or update a set

Insert or update a list

Insert or update a map

Was this helpful?

Give Feedback