GROUP BY clause

Group by one or more columns. Condenses the selected rows that share the same values for a set of columns or values returned by a function into a group. Either one or more primary key columns or a deterministic function or aggregate can be used in the GROUP BY clause.

SELECT race_date, race_time FROM cycling.race_times_summary
  GROUP BY race_date;
Results
 race_date  | race_time
------------+--------------------
 2019-03-21 | 10:01:18.000000000
 2018-07-26 | 10:01:18.000000000
 2017-04-14 | 10:01:18.000000000

(3 rows)

Warnings :
Aggregation query used without partition key

Each set of rows with the same race_date column value are grouped together into one row in the query output. Three rows are returned because there are three groups of rows with the same race_date column value. The value returned is the first value that is found for the group.

ORDER BY clause

You can fine-tune the display order using the ORDER BY clause. The partition key must be defined in the WHERE clause and then the ORDER BY clause defines one or more clustering columns to use for ordering. The order of the specified columns must match the order of the clustering columns in the PRIMARY KEY definition. The options for ordering are ASC (ascending) and DESC (descending).

If no order is specified, the results are returned in the stored order.

Note that using both IN and ORDER BY require turning off paging with the PAGING OFF command in cqlsh.

SELECT * FROM cycling.calendar WHERE race_id IN (100, 101, 102)
ORDER BY race_start_date ASC;
Results
 race_id | race_start_date                 | race_end_date                   | race_name
---------+---------------------------------+---------------------------------+-----------------------
     100 | 2013-05-07 00:00:00.000000+0000 | 2014-05-29 00:00:00.000000+0000 |         Giro d'Italia
     101 | 2013-06-05 00:00:00.000000+0000 | 2013-06-12 00:00:00.000000+0000 | Criterium du Dauphine
     102 | 2013-06-11 00:00:00.000000+0000 | 2013-06-19 00:00:00.000000+0000 |        Tour de Suisse
     100 | 2014-05-08 00:00:00.000000+0000 | 2014-05-30 00:00:00.000000+0000 |         Giro d'Italia
     101 | 2014-06-06 00:00:00.000000+0000 | 2014-06-13 00:00:00.000000+0000 | Criterium du Dauphine
     102 | 2014-06-12 00:00:00.000000+0000 | 2014-06-20 00:00:00.000000+0000 |        Tour de Suisse
     100 | 2015-05-09 00:00:00.000000+0000 | 2015-05-31 00:00:00.000000+0000 |         Giro d'Italia
     101 | 2015-06-07 00:00:00.000000+0000 | 2015-06-14 00:00:00.000000+0000 | Criterium du Dauphine
     102 | 2015-06-13 00:00:00.000000+0000 | 2015-06-21 00:00:00.000000+0000 |        Tour de Suisse

(9 rows)

The ORDER BY clause also supports vector searches of the vector column. The result set is sorted using the Approximate Nearest Neighbor (ANN) algorithm with the supplied array values.

LIMIT clause

If a query returns a large number of rows, you can limit the number of rows returned, to limit the amount of data returned. The default limit is set to 10,000 rows, the number of rows cqlsh allows. This examples limits the rows to 3:

SELECT * FROM cycling.comments_vs 
  ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] 
  LIMIT 3;
Results
 id                                   | created_at                      | comment                                | comment_vector               | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
 e8ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 |              rain, rain,rain, go away! | [0.9, 0.54, 0.12, 0.1, 0.95] |      John | 70157590-4084-11ef-893f-0fd142cea825
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] |      Alex | 700ebed0-4084-11ef-893f-0fd142cea825
 e8ae5df3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 |                    Rain like a monsoon | [0.9, 0.54, 0.12, 0.1, 0.95] |      Jane | 701611d0-4084-11ef-893f-0fd142cea825

(3 rows)

PER PARTITION LIMIT clause

The PER PARTITION LIMIT option sets the maximum number of rows that the query returns from each partition. This will only apply to tables that spread across more than one partition. An example of such a table is defined here:

USE cycling;
CREATE TABLE rank_by_year_and_name ( 
  race_year int, 
  race_name text, 
  cyclist_name text, 
  rank int, 
  PRIMARY KEY ((race_year, race_name), rank) 
);

where the partition key is a composite of race_year and race_name.

The following query returns the top two cyclists from each partition stored:

SELECT rank, cyclist_name AS name FROM cycling.rank_by_year_and_name 
  PER PARTITION LIMIT 2;
Results

ALLOW FILTERING

The ALLOW FILTERING clause allows you to perform queries that require scanning all partitions, with no primary key columns specified. It should not be used in production as it can cause severe performance issues! When initially modeling your data, you should avoid using ALLOW FILTERING and instead model your data to avoid it. However, for a small dataset or for testing purposes, it can be useful. It may even help you identify where you need to add indexes to your data model.

For more information, see Allow Filtering explained.

The following query selects the birthday and nationality columns from the cyclist_alt_stats table, with the ALLOW FILTERING clause:

SELECT lastname, birthday, nationality FROM cycling.cyclist_alt_stats
  WHERE birthday = '1991-08-25' AND nationality = 'Ethiopia'
ALLOW FILTERING;
Results

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com