Primary keys in tables

When you create a table, you must specify a primary key. A primary key consists of one or more columns. The primary key is the unique identifier for rows in the table.

Column defined in the primary key are automatically indexed and available for querying. For more information about the role of indexes in tables, see Indexes in tables.

You cannot use map, list, or set columns in primary keys.

Due to a known issue with filtering on blob columns, DataStax does not recommend using blob columns in primary keys.

There are three types of primary keys that you can define. The type of key you use depends on your data model and the types of queries you plan to run.

Single-column primary keys

A single-column primary key is a primary key consisting of one column.

This option is best for use cases where you usually retrieve rows by a single value. For example, you could use this strategy for a small customer database where every customer is uniquely identified by their email address, and you always look up customers by their email address.

For examples of how to define a single-column primary key, see the "Create a table with a single-column primary key " example.

Composite primary keys

A composite primary key is a primary key consisting of multiple columns. The rows are uniquely identified by the combination of the values from each column.

This strategy can make queries more efficient by creating partitions (groups) of rows based on each primary key column. For example, if your primary key includes country and city, the database has implicit groups of rows with the same country or city, making it more efficient to search for rows in a specific country or city.

This is a moderately complex strategy that allows for more nuanced queries and more complex unique identifiers. It can be useful if your rows are uniquely defined by values from multiple columns or your data falls into natural groupings, such as location or time. For example, you could use this strategy for scenarios such as the following:

A manufacturing database that uniquely identifies products by the production date, factory location, and SKU
A global customer database that groups customers by country or locality, in addition to an identifier, such as customer ID or email address

For composite primary keys, avoid columns with low cardinality (low diversity of values). For example, a customer database with an overabundance of customers from a single country might not benefit from partitioning by country. Instead, you could use locality identifiers, such as states or postal codes, to break large customer segments into smaller groups for more efficient queries.

For examples of how to define a composite primary key, see the "Create a table with a composite primary key" example.

Compound primary keys

A compound primary key is a primary key consisting of partition (grouping) columns and clustering (sorting) columns. The rows are uniquely identified by the combination of the values from each column.

This is the most complex partitioning strategy, but it can provide the most flexibility and efficiency for querying data, if it is appropriate for your data model. This strategy can be useful for scenarios where you need to perform range queries or sort time-series data.

For example, assume you have a retail database where each row represents an order, and the orders are partitioned by customer ID and clustered by purchase date. In this case, the database implicitly groups each customer’s order together, and then sorts each customer’s orders by purchase date. This can make it more efficient to retrieve a customer’s most recent orders when they contact customer service or when they check the status of their orders in their account.

You can have multiple partition columns and multiple clustering columns. When clustering on multiple columns, the order you declare the columns matters. For example, if you cluster by date and name, the data is first sorted by the date, and then any rows with the same date are sorted by name.

In compound primary keys, avoid choosing clustering columns with high cardinality (high diversity of values), depending on your data model. For example, a purchase number column may not be ideal for clustering because it is unlikely to contain duplicates. Instead, choose clustering columns with moderate cardinality, such as purchase date, while avoiding columns with extremely low cardinality, such as booleans.

For examples of how to define a composite primary key, see the "Create a table with a compound primary key" example.

Primary keys in tables

Single-column primary keys

Composite primary keys

Compound primary keys

Was this helpful?

Give Feedback