Collection data types overview
When to use collections
Collections are useful when the data is small and the data is always accessed together. For example, collections are ideal for storing a user’s email addresses, phone numbers, or a cyclist’s teams and events.
CQL avoids joins between two tables by storing the grouping of items in a collection column in the user table. In a relational database, this grouping would be stored in separate tables and joined between tables with one or more foreign keys.
If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns. Collections are intended for insertion and retrieval as a collection. They are not intended for querying individual elements within the collection.
Which collection type to use
CQL reads collections in their entirety, so retrieval performance can be affected. In general, collections should be smaller than the following maximum size to prevent querying delays.
Guard rails for non-frozen collections:
-
No more than 2 billion items in a collection.
-
Maximum size of an item in a set is 65,535 bytes.
-
Maximum size of an item in a list or map is 2 GB.
-
Maximum number of keys in a map is 65,535.
Guard rails for frozen collections:
-
The maximum size of a frozen collection is 2 GB, as with the BLOB data type. In general, frozen collections should be smaller than 1 MB to prevent querying delays.
When choosing a collection type, consider the following:
A good rule of thumb is that sets are more performant than lists, so use a set if you can. Use a list when the order of elements matter or when you need to store the same value multiple times. Use a map when you need to store key-value pairs.
Frozen vs non-frozen collections
Frozen collections, are more efficient than non-frozen collections, but they can only be updated as a whole.
When filtering a SELECT
query, the entire frozen collection is used and read, even if only one element is needed.
Another benefit of frozen collections is that they can be used as part of the primary key, which is not possible with non-frozen collections.
Non-frozen collections are more flexible, and can update a single value, but they are slower to read and write.
When filtering a SELECT
query, only the elements that match the filter are executed and read, not a full collection.
Thus, non-frozen collections can be filtered with a WHERE
clause that uses CONTAINS
to match a value, CONTAINS KEY
to match a key, or for maps, map[key] = value.
Non-frozen collections can also be nested, if the nested collection is frozen.
If you will not need to update the collection, use frozen collections to improve performance. If you need to update the collection, use non-frozen collections.