Defining a partition key with clustering columns

A compound primary key consists of a partition key that determines which node stores the data and of clustering columns that determine the order of the data on the partition.

For a table with a compound primary key, DataStax Enterprise uses a partition key that is either simple or composite. In addition, clustering columns are defined. Clustering is a storage engine process that sorts data within each partition based on the definition of the clustering columns. Normally, columns are sorted in ascending alphabetical order. Generally, a different grouping of data benefits reads and writes better than this simplistic choice.
Important: A NULL value cannot be inserted into a PRIMARY KEY column. This restriction applies to both partition keys and clustering columns.

Remember that data is distributed throughout a cluster. An application can experience high latency while retrieving data from a large partition if the entire partition must be read to gather a small amount of data. On a physical node, when rows for a partition key are stored in order based on the clustering columns, retrieval of rows is very efficient. Grouping data in tables using a clustering column or columns is analogous to JOINs in a relational database, but clustering columns are much more performant because only one table is accessed. This table uses category as the partition key and points as the clustering column. Notice that for each category, the points are ordered in descending order.

The database stores an entire row of data on a node by partition key and can order the data for retrieval with clustering columns. Retrieving data from a partition is more versatile with clustering columns. For the example shown, a query could retrieve all point values greater than 200 for the One-day-races. If your environment has more complex needs for querying, use a compound primary key.