Sorting and limiting results

Using the SELECT command for simple queries.

Similar to SQL, CQL can SELECT data using simple or complex qualifiers. At its simplest, a query selects all data in a table. At its most complex, a query delineates which data to retrieve and display and even calculates new values based on user-defined functions. This section uses several example tables in the cycling keyspace.

For SASI indexing, see queries in Using SASI.

Controlling the number of rows returned using PER PARTITION LIMIT

The PER PARTITION LIMIT option sets the maximum number of rows that the query returns from each partition. Create a table that will sort data into more than one partition.

CREATE TABLE cycling.rank_by_year_and_name (
  race_year int,
  race_name text,
  cyclist_name text,
  rank int,
  PRIMARY KEY ((race_year, race_name), rank)
);

After inserting data, the table holds:

 race_year | race_name                                  | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
      2014 |                        4th Tour of Beijing |    1 |    Phillippe GILBERT
      2014 |                        4th Tour of Beijing |    2 |        Daniel MARTIN
      2014 |                        4th Tour of Beijing |    3 | Johan Esteban CHAVES
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |      Benjamin PRADES
      2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |        Ilnur ZAKARIN
      2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    2 |      Carlos BETANCUR
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |      Benjamin PRADES
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |          Adam PHELAN
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |         Thomas LEBAS

Now, to get the top two racers in every race year and race name, use the following command with PER PARTITION LIMIT 2.

SELECT *
FROM cycling.rank_by_year_and_name
PER PARTITION LIMIT 2;

Output:

 race_year | race_name                                  | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
      2014 |                        4th Tour of Beijing |    1 |    Phillippe GILBERT
      2014 |                        4th Tour of Beijing |    2 |        Daniel MARTIN
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
      2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |        Ilnur ZAKARIN
      2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    2 |      Carlos BETANCUR
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |      Benjamin PRADES
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |          Adam PHELAN

Procedure

  • Use a simple SELECT query to display all data from the cyclist category table.
    CREATE TABLE cycling.cyclist_category (
      category text,
      points int,
      id UUID,
      lastname text,
      PRIMARY KEY (category, points))
    WITH CLUSTERING ORDER BY (points DESC);
    
    SELECT *
    FROM cycling.cyclist_category;
     category      | points | id                                   | lastname
    ---------------+--------+--------------------------------------+------------
     One-day-races |    367 | 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  TIRALONGO
     One-day-races |    198 | 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | KRUIJSWIJK
        Time-trial |    182 | 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  TIRALONGO
        Time-trial |      3 | 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | KRUIJSWIJK
            Sprint |     39 | 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | KRUIJSWIJK
            Sprint |      0 | 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  TIRALONGO
                GC |   1324 | 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | KRUIJSWIJK
                GC |   1269 | 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  TIRALONGO
  • The example below illustrates how to create a query that uses category as a filter.
    SELECT *
    FROM cycling.cyclist_category 
    WHERE category = 'Sprint';
     category | points | id                                   | lastname
    ----------+--------+--------------------------------------+------------
       Sprint |     39 | 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | KRUIJSWIJK
       Sprint |      0 | 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  TIRALONGO

    The database rejects this query if category is not a partition key or clustering column, because the query cannot locate the data on disk. Queries require a sequential retrieval across the entire cyclist_category table. In DataStax Enterprise, this is a crucial concept to grasp; scanning all data across all nodes is prohibitively slow and thus blocked from execution. The use of partition key and clustering columns in a WHERE clause must result in the selection of a contiguous set of rows.

    You can filter queries using secondary indexes, see Building and maintaining indexes. A query based on lastname can result in satisfactory results if the lastname column is indexed.

  • Clustering columns can be defined in WHERE clauses if ALLOW FILTERING is also used even if a secondary index is not created. The table definition is given and then the SELECT command. Note that race_start_date is a clustering column that has no secondary index.
    CREATE TABLE cycling.calendar (
      race_id int, 
      race_name text, 
      race_start_date timestamp, 
      race_end_date timestamp, 
      PRIMARY KEY (race_id, race_start_date, race_end_date)
    );
    SELECT *
    FROM cycling.calendar 
    WHERE race_start_date = '2015-06-13'
    ALLOW FILTERING;
     race_id | race_start_date                 | race_end_date                   | race_name
    ---------+---------------------------------+---------------------------------+----------------
         102 | 2015-06-13 00:00:00.000000+0000 | 2015-06-21 00:00:00.000000+0000 | Tour de Suisse
  • You can also pick the columns to display instead of choosing all data.
    SELECT category, points, lastname
    FROM cycling.cyclist_category;
     category      | points | lastname
    ---------------+--------+------------
     One-day-races |    367 |  TIRALONGO
     One-day-races |    198 | KRUIJSWIJK
        Time-trial |    182 |  TIRALONGO
        Time-trial |      3 | KRUIJSWIJK
            Sprint |     39 | KRUIJSWIJK
            Sprint |      0 |  TIRALONGO
                GC |   1324 | KRUIJSWIJK
                GC |   1269 |  TIRALONGO
  • For a large table, limit the number of rows retrieved using LIMIT. The default limit is 10,000 rows. To sample data, pick a smaller number. To retrieve more than 10,000 rows, set LIMIT to a large value.
    CREATE TABLE cycling.cyclist_name (
      id UUID PRIMARY KEY,
      lastname text,
      firstname text
    );
    
    SELECT id, lastname, firstname
    FROM cycling.cyclist_name
    LIMIT 3;
     id                                   | lastname | firstname
    --------------------------------------+----------+-----------
     e7ae5cf3-d358-4d99-b900-85902fda9bb0 |    FRAME |      Alex
     fb372533-eb95-4bb4-8685-6ef61e994caa | MATTHEWS |      null
     5b6962dd-3f90-4c93-8f61-eabfa4a803e2 |      VOS |      null                    
  • You can fine-tune the display order using the ORDER BY clause. The partition key must be defined in the WHERE clause and the ORDER BY clause defines the clustering column to use for ordering.
    CREATE TABLE cycling.cyclist_cat_pts (
      category text,
      points int,
      id UUID,lastname text,
      PRIMARY KEY (category, points)
    ); 
    
    SELECT *
    FROM cycling.cyclist_cat_pts
    WHERE category = 'GC'
    ORDER BY points ASC;
     category | points | id                                   | lastname
    ----------+--------+--------------------------------------+------------
           GC |    780 | 829aa84a-4bba-411f-a4fb-38167a987cda | SUTHERLAND
           GC |   1269 | 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  TIRALONGO                    
  • Tuples are retrieved in their entirety. This example uses AS to change the header of the tuple name.
    SELECT
      race_name, point_id,
      lat_long AS CITY_LATITUDE_LONGITUDE
    FROM cycling.route;
  • The PER PARTITION LIMIT option sets the maximum number of rows that the query returns from each partition. This is interesting because it allows a query to select a "Top 3" selection if the partitions are separated correctly. Create a table that will sort data into more than one partition and insert some data:
    CREATE TABLE cycling.rank_by_year_and_name ( 
      race_year int, 
      race_name text, 
      cyclist_name text, 
      rank int, 
      PRIMARY KEY ((race_year, race_name), rank)
    );
     race_year | race_name                                  | rank | cyclist_name
    -----------+--------------------------------------------+------+----------------------
          2014 |                        4th Tour of Beijing |    1 |    Phillippe GILBERT
          2014 |                        4th Tour of Beijing |    2 |        Daniel MARTIN
          2014 |                        4th Tour of Beijing |    3 | Johan Esteban CHAVES
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |      Benjamin PRADES
          2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |        Ilnur ZAKARIN
          2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    2 |      Carlos BETANCUR
          2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |      Benjamin PRADES
          2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |          Adam PHELAN
          2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |         Thomas LEBAS
  • Now use a PER PARTITION LIMIT to get the top two races for each race year and race name pair:
    SELECT *
    FROM cycling.rank_by_year_and_name
    PER PARTITION LIMIT 2;
     race_year | race_name                                  | rank | cyclist_name
    -----------+--------------------------------------------+------+----------------------
          2014 |                        4th Tour of Beijing |    1 |    Phillippe GILBERT
          2014 |                        4th Tour of Beijing |    2 |        Daniel MARTIN
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
          2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
          2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    1 |        Ilnur ZAKARIN
          2015 |   Giro d'Italia - Stage 11 - Forli > Imola |    2 |      Carlos BETANCUR
          2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |      Benjamin PRADES
          2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |          Adam PHELAN