Collection data types overview

Collection data types are a way to group and store data together in a column.

The following collection data types are available in CQL:

  • set: store unordered items

  • list: store ordered items

  • map: store key-value pairs

Collections are indexable, for more versatile querying.

When to use collections

Collections are useful when the data is small and the data is always accessed together. For example, collections are ideal for storing a user’s email addresses, phone numbers, or a cyclist’s teams and events.

CQL avoids joins between two tables by storing the grouping of items in a collection column in the user table. In a relational database, this grouping would be stored in separate tables and joined between tables with one or more foreign keys.

If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns. Collections are intended for insertion and retrieval as a collection. They are not intended for querying individual elements within the collection.

Which collection type to use

CQL reads collections in their entirety, so retrieval performance can be affected. In general, collections should be smaller than the following maximum size to prevent querying delays.

Guard rails for non-frozen collections:

  • No more than 2 billion items in a collection.

  • Maximum size of an item in a set is 65,535 bytes.

  • Maximum size of an item in a list or map is 2 GB.

  • Maximum number of keys in a map is 65,535.

Guard rails for frozen collections:

  • The maximum size of a frozen collection is 2 GB, as with the BLOB data type. In general, frozen collections should be smaller than 1 MB to prevent querying delays.

When choosing a collection type, consider the following:

A good rule of thumb is that sets are more performant than lists, so use a set if you can. Use a list when the order of elements matter or when you need to store the same value multiple times. Use a map when you need to store key-value pairs.

Frozen vs non-frozen collections

Frozen collections, are more efficient than non-frozen collections, but they can only be updated as a whole. When filtering a SELECT query, the entire frozen collection is used and read, even if only one element is needed. Another benefit of frozen collections is that they can be used as part of the primary key, which is not possible with non-frozen collections.

Non-frozen collections are more flexible, and can update a single value, but they are slower to read and write. When filtering a SELECT query, only the elements that match the filter are executed and read, not a full collection. Thus, non-frozen collections can be filtered with a WHERE clause that uses CONTAINS to match a value, CONTAINS KEY to match a key, or for maps, map[key] = value. Non-frozen collections can also be nested, if the nested collection is frozen.

If you will not need to update the collection, use frozen collections to improve performance. If you need to update the collection, use non-frozen collections.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com