Partition keys

Identify partition keys to limit the query and retrieve data from specific nodes.

Identifying the partition in a WHERE clause allows the database to quickly retrieve the data from the appropriate node. Avoid running queries across multiple partitions whenever possible. The database requires that all partition keys are restricted or none. All the partition key columns are required to compute the hash and locate the node containing the partition.

If no restrictions are specified on the partition keys but some are specified on the clustering keys, ALLOW FILTERING is required to execute the query.

Tip: DataStax recommends limiting queries to a single partition to avoid performance issues that occur with multi-partition operations. Performance issues can occur when using the IN operator, omitting a WHERE clause with logical statements that identifies the partition, or other operators that require the ALLOW FILTERING option. For more details, see ALLOW FILTERING explained.
Use the following operators for partition key logical statements:
  • Equals (=)
  • IN
  • Ranges (>, >=, <, <=) on tokenized partition keys

Prerequisites

This procedure uses the rank_by_year_and_name example.

Procedure

  • Exact values using equals (=) operator
    To filter on a regular column or clustering columns, restrict all the partition key columns.
    SELECT rank, cyclist_name as name FROM cycling.rank_by_year_and_name
    WHERE race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu' AND race_year = 2015
    AND rank <= 2;
  • Values in a list for the last partition key column using the IN operator
    Use the IN operator on the last partition key column. For example, to return multiple years of the same race and stage:
    SELECT rank, cyclist_name as name FROM cycling.rank_by_year_and_name
    WHERE race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu' AND race_year IN (2014, 2015);
    Warning: Using IN query across partitions might cause performance issues; see Not using the “in” query for multiple partitions for a detailed explanation.

    Results are returned in the natural order of the column type.

  • Partition range using the TOKEN function with >, >=, <=, or < operators

    The database distributes the partition across the nodes using the selected partitioner. Only the ByteOrderedPartitioner keeps an ordered distribution of data. Select partitions in a range by tokenizing the partition keys and using greater than and less than operators.

    To filter the results using a token range:
    SELECT TOKEN(race_year, race_name), race_name AS name, race_year AS year
    FROM cycling.rank_by_year_and_name
    WHERE TOKEN(race_year, race_name) >= -3074457345618258603
    AND TOKEN(race_year, race_name) <= 3074457345618258602;