Defining a multi-column partition key

A partition key can have a partition key defined with multiple table columns which determines which node stores the data.

For a table with a composite partition key, DataStax Enterprise uses multiple columns as the partition key. These columns form logical sets inside a partition to facilitate retrieval. In contrast to a simple partition key, a composite partition key uses two or more columns to identify where data resides. Composite partition keys are used when the data stored is too large to reside in a single partition. Using more than one column for the partition key breaks the data into chunks, or buckets. The data is still grouped, but in smaller chunks. This method can be effective if a cluster experiences hotspotting, or congestion in writing data to one node repeatedly, because a partition is heavily writing. DSE is often used for time series data, and hotspotting can be a real issue. Breaking incoming data into buckets by year:month:day:hour, using four columns to route to a partition can decrease hotspots.
Important: A NULL value cannot be inserted into a PRIMARY KEY column. This restriction applies to both partition keys and clustering columns.

Data is retrieved using the partition key. Keep in mind that to retrieve data from the table, values for all columns defined in the partition key have to be supplied if secondary indexes are not used. The table shown uses race_year and race_name in the primary key, as a composite partition key. To retrieve data, both parameters must be identified.

The database stores an entire row of data on a node by partition key. If there is too much data in a partition and data needs to be spread among multiple nodes, use a composite partition key.

Using a composite partition key

Use a composite partition key to identify where data is stored in multiple partitions.

Use a composite partition key in a primary key to create a set of columns that can distribute data across multiple partitions, which can be queried to return sorted results.

The following example creates a rank_by_year_and_name table storing the ranking and name of cyclists who competed in races. The table uses race_year and race_name as the columns defining the composite partition key of the Primary_key. The query discovers the ranking of cyclists who competed in races by supplying year and race name values.
Important: A NULL value cannot be inserted into a PRIMARY KEY column. This restriction applies to both partition keys and clustering columns.

A composite partition key table can be created in a few different ways as shown in the following examples.

Procedure

  • Create the table rank_by_year_and_name in the cycling keyspace. Use race_year and race_name for the composite partition key. The table definition shown has an additional column rank used in the primary key. Before creating the table, set the keyspace with a USE statement. This example identifies the primary key at the end of the table definition. Note the double parentheses around the first two columns defined in the PRIMARY KEY.
    USE cycling;
    CREATE TABLE IF NOT EXISTS rank_by_year_and_name ( 
      race_year int, 
      race_name text, 
      cyclist_name text, 
      rank int, 
      PRIMARY KEY ((race_year, race_name), rank) 
    );
  • The keyspace name can be used to identify the keyspace in the CREATE TABLE statement instead of the USE statement.
    CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
      race_year int,
      race_name text,
      cyclist_name text,
      rank int,
      PRIMARY KEY ((race_year, race_name), rank)
    );