SELECT
Returns data from a single table.
A SELECT
statement without a WHERE
clause is not recommended because all rows from all partitions are returned.
Queries that span multiple partitions can seriously impact performance.
See also: CREATE TABLE
Syntax
SELECT [ JSON | DISTINCT ] <selectors> FROM [<keyspace_name>.]<table_name> [ WHERE [ <primary_key_conditions> | <non_primary_key_conditions> IN (<column_name> [, ...]) ALLOW FILTERING ] [ AND ] [ <index_conditions> ] [ GROUP BY <column_name> [ , ... ] ] [ ORDER BY <column_name> ( ASC | DESC ) [ , ... ] ] | [ ORDER BY <vector_column_name> ANN OF [n,n,n,...] [ LIMIT N ] ] [ ( LIMIT <N> [ OFFSET <N> ] | PER PARTITION LIMIT <N> ) ] [ ALLOW FILTERING ] ;
Syntax legend
Syntax conventions | Description |
---|---|
UPPERCASE |
Literal keyword. |
Lowercase |
Not literal. |
|
Variable value. Replace with a user-defined value. |
|
Optional.
Square brackets ( |
|
Group.
Parentheses ( |
|
Or.
A vertical bar ( |
|
Repeatable.
An ellipsis ( |
|
Single quotation ( |
|
Map collection.
Braces ( |
Set, list, map, or tuple.
Angle brackets ( |
|
|
End CQL statement.
A semicolon ( |
|
Separate the command line options from the command arguments with two hyphens ( |
|
Search CQL only: Single quotation marks ( |
|
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
Parameters
Parameter | Description | Default |
---|---|---|
|
Optional. Name of the keyspace that contains the table to select. |
If no name is specified, the current keyspace is used. |
|
Name of the table to select. |
|
|
Name of the column to select. |
|
|
Name of the vector column to select. |
Selectors
The SELECT
statement is used to query data from a table in CQL.
The results are returned as a set of rows.
Selectors are used to specify which columns are queried and returned.
The SELECT
statement can also transform the data on return using functions.
Table data also has related metadata that can be returned using metadata functions.
Return all columns
To return all columns from a table, use the wildcard *
selector.
SELECT * FROM cycling.cyclist_name;
Results
id | firstname | lastname
--------------------------------------+-----------+-----------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | Alex | FRAME
fb372533-eb95-4bb4-8685-6ef61e994caa | Michael | MATTHEWS
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS
220844bf-4860-49d6-9a4b-6b5d3a79cbfb | Paolo | TIRALONGO
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | Steven | KRUIKSWIJK
e7cd5752-bc0d-4157-a80f-7523add8dbcd | Anna | VAN DER BREGGEN
(6 rows)
Since the *
selector returns all columns, it is not necessary to specify the column names.
The more columns a table has, the more resources are required to return the data.
Return specific columns
Because the number of columns returned can affect performance, it is recommended to specify the columns to return, if possible.
For example, to return only the lastname
column from the cycling.cyclist_name
table:
SELECT lastname FROM cycling.cyclist_name;
Results
lastname
-----------------
FRAME
MATTHEWS
VOS
TIRALONGO
KRUIKSWIJK
VAN DER BREGGEN
(6 rows)
To return multiple columns, separate the column names with a comma:
SELECT lastname, firstname FROM cycling.cyclist_name;
Results
lastname | firstname
-----------------+-----------
FRAME | Alex
MATTHEWS | Michael
VOS | Marianne
TIRALONGO | Paolo
KRUIKSWIJK | Steven
VAN DER BREGGEN | Anna
(6 rows)
Columns can be renamed on return using an alias:
SELECT id, cyclist_age AS age FROM cycling.cyclist_alt_stats LIMIT 3;
Results
id | age
--------------------------------------+------
e0953617-07eb-4c82-8f91-3b2757981625 | null
1ba0417d-62da-4103-b710-de6fb222db6f | 34
4ceb495c-55ab-4f71-83b9-81117252bf13 | 34
(3 rows)
A column alias cannot be used in the |
Return transformed data
The SELECT
statement can transform the data on return using functions.
The functions are varied and can be used to manipulate the data in different ways.
For example, JSON
can be used to return all columns as JSON, toJson
to return selected column as JSON, or CAST
can be used to change the data type of a column.
Distinct column values can be returned using the DISTINCT
function.
The number of rows with a particular column can be returned using the COUNT
function.
Any native or user-defined functions can be used to transform the data, such as the SUM
or AVG
function.
Any user-defined aggregate function can be used to transform the data, such as the average
function.
The TOKEN
function can be used to return the token value of a column.
To output selected data from a table in JSON format, use the JSON
function:
SELECT JSON * FROM cycling.birthday_list;
Results
[json]
----------------------------------------------------------------------------------------------------------
{"cyclist_name": "Claudio HEINEN", "blist": {"age": "23", "bday": "27/07/1992", "nation": "GERMANY"}}
{"cyclist_name": "Claudio VANDELLI", "blist": {"age": "54", "bday": "27/07/1961", "nation": "ITALY"}}
{"cyclist_name": "Luc HAGENAARS", "blist": {"age": "28", "bday": "27/07/1987", "nation": "NETHERLANDS"}}
{"cyclist_name": "Toine POELS", "blist": {"age": "52", "bday": "27/07/1963", "nation": "NETHERLANDS"}}
{"cyclist_name": "Allan DAVIS", "blist": {"age": "35", "bday": "27/07/1980", "nation": "AUSTRALIA"}}
{"cyclist_name": "Laurence BOURQUE", "blist": {"age": "23", "bday": "27/07/1992", "nation": "CANADA"}}
(6 rows)
Note the nested JSON structure in the output, with the column name as the key and the column value as the value.
To specify the JSON format for a particular column, enclose its name in the toJson()
function:
SELECT id, commenter, toJson(created_at) FROM cycling.comments_vs;
Results
id | commenter | system.tojson(created_at)
--------------------------------------+-----------+----------------------------
e8ae5cf3-d358-4d99-b900-85902fda9bb0 | John | "2017-04-01 14:33:02.160Z"
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | Alex | "2017-04-01 14:33:02.160Z"
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | Alex | "2017-03-21 21:11:09.999Z"
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | Alex | "2017-02-14 20:43:20.000Z"
c7fceba0-c141-4207-9494-a29f9809de6f | Amy | "2024-07-02 22:39:30.758Z"
c7fceba0-c141-4207-9494-a29f9809de6f | Amy | "2017-04-01 13:43:08.030Z"
c7fceba0-c141-4207-9494-a29f9809de6f | Amy | "2017-03-22 01:16:59.001Z"
c7fceba0-c141-4207-9494-a29f9809de6f | Amy | "2017-02-17 08:43:20.234Z"
c7fceba0-c141-3207-9494-a29f9809de6f | Amy | "2017-02-17 08:43:20.234Z"
c7fceba0-c141-7207-9494-a29f9809de6f | Amy | "2024-07-02 22:39:30.763Z"
e8ae5df3-d358-4d99-b900-85902fda9bb0 | Jane | "2017-04-01 14:33:02.160Z"
c7fceba0-c141-4207-9594-a29f9809de6f | Jane | "2017-02-17 08:43:20.234Z"
(12 rows)
Note that the JSON-formatted timestamp
column can include complete time zone information.
To cast a column to a different data type, use the CAST
function:
SELECT CAST(created_at AS date) FROM cycling.comments;
Results
cast(created_at as date)
--------------------------
2023-04-01
2024-06-07
(2 rows)
If a table has duplicate values in a column, use the DISTINCT
function to return only the unique values:
SELECT DISTINCT country FROM cycling.country_flag;
Results
country
---------
Belgium
France
(2 rows)
You may want to count the number of rows in a column.
Use the COUNT
function to return the count:
SELECT start_month, MAX(start_day), COUNT(*) FROM cycling.events
WHERE year = 2017 AND discipline = 'Cyclo-cross';
Results
start_month | system.max(start_day) | count
-------------+-----------------------+-------
1 | 28 | 11
(1 rows)
To use the COUNT
function with a WHERE
clause, include the COUNT
function in the SELECT
list and the WHERE
clause in the query.
For example, to get the number of cyclists from Belgium:
SELECT count(cyclist_name) FROM cycling.country_flag
WHERE country='Belgium';
Results
system.count(cyclist_name)
----------------------------
2
(1 rows)
A number of native functions can be used to transform the data.
The data is transformed only on return, rather than stored in the table.
These are general functions that are familiar to most people, such as the average, or AVG
function:
SELECT AVG(race_points) FROM cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
Results
system.avg(race_points)
-------------------------
67
(1 rows)
User-defined functions can also be used to transform the data. For examples, the 'left` function can be used to return a column that is left-justified:
SELECT left(firstname, 1), lastname FROM cycling.cyclist_name;
Results
User-defined aggregates can also be used to transform the data.
For examples, the 'average` function can be used to retrieve the average of the column cyclist_time_sec
from a table:
SELECT average(cyclist_time_sec) AS Average FROM cycling.team_average
WHERE team_name = 'UnitedHealthCare Pro Cycling Womens Team';
Results
average
-------------
11474.66667
(1 rows)
Warnings :
User defined function cycling.average_state : (tuple<int, bigint>, int) -> tuple<int, bigint> consumed more than 500µs CPU time (2120µs)
The TOKEN
function can be used to return the token value of a column.
The token value is the hashed value of the partition key, thus it can only be used on the partition key column or columns.
This function is useful when using the token value in a query to filter data with a WHERE
clause.
To return the token value of the race_id
column in the cycling.race
table:
SELECT race_name AS name,race_year AS year FROM cycling.rank_by_year_and_name
WHERE TOKEN(race_year,race_name) >= 4582455970709790046;
Results
name | year
--------------------------------------------+------
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
(3 rows)
Return column metadata
Each non-primary key column in a table has metadata associated with it.
The WRITETIME
function returns the timestamp of the last write to a column.
The TTL
function returns the time to live of a column.
It is important to note that the metadata functions are not available for all columns.
For example, the metadata functions are not available for columns with a data type of counter
or primary key columns.
Additionally, the metadata functions are not available for columns that are part of a collection or a user-defined type that is not frozen.
To return the writetime of the firstname
column in the cycling.cyclist_points
table:
SELECT WRITETIME (firstname) FROM cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
Results
writetime(firstname)
----------------------
1719960338817118
1719960338823276
1719960338827071
(3 rows)
The TTL
function returns the time to live of a column.
This function is useful when using Time to Live (TTL) to expire data in a table.
If a TTL
is set on a column, the data is automatically deleted after the specified time has elapsed.
To return the TTL of the race_points
column in the cycling.cyclist_points
table:
SELECT TTL(race_name) FROM cycling.calendar WHERE race_id = 200;
Results
ttl(race_name)
----------------
199
(1 rows)
Options
GROUP BY clause
Group by one or more columns.
Condenses the selected rows that share the same values for a set of columns or values returned by a function into a group.
Either one or more primary key columns or a deterministic function or aggregate can be used in the GROUP BY
clause.
SELECT race_date, race_time FROM cycling.race_times_summary
GROUP BY race_date;
Results
race_date | race_time
------------+--------------------
2019-03-21 | 10:01:18.000000000
2018-07-26 | 10:01:18.000000000
2017-04-14 | 10:01:18.000000000
(3 rows)
Warnings :
Aggregation query used without partition key
Each set of rows with the same race_date
column value are grouped together into one row in the query output.
Three rows are returned because there are three groups of rows with the same race_date
column value.
The value returned is the first value that is found for the group.
This query groups the rows by race_date
and FLOOR(race_time, 1h)
, which returns the hour.
The number of rows in each group is returned by COUNT(*)
.
SELECT race_date, FLOOR(race_time, 1h), COUNT(*) FROM cycling.race_times_summary
GROUP BY race_date, FLOOR(race_time, 1h);
Results
race_date | system.floor(race_time, 1h) | count
------------+-----------------------------+-------
2019-03-21 | 10:00:00.000000000 | 2
2019-03-21 | 11:00:00.000000000 | 1
2019-03-21 | 12:00:00.000000000 | 1
2018-07-26 | 10:00:00.000000000 | 2
2018-07-26 | 11:00:00.000000000 | 1
2018-07-26 | 12:00:00.000000000 | 1
2017-04-14 | 10:00:00.000000000 | 2
2017-04-14 | 11:00:00.000000000 | 1
2017-04-14 | 12:00:00.000000000 | 1
(9 rows)
Warnings :
Aggregation query used without partition key
Nine rows are returned because there are nine groups of rows with the same race_date
and FLOOR(race_time, 1h)
values.
ORDER BY clause
You can fine-tune the display order using the ORDER BY
clause.
The partition key must be defined in the WHERE
clause and then the ORDER BY
clause defines one or more clustering columns to use for ordering.
The order of the specified columns must match the order of the clustering columns in the PRIMARY KEY
definition.
The options for ordering are ASC
(ascending) and DESC
(descending).
If no order is specified, the results are returned in the stored order.
Note that using both |
SELECT * FROM cycling.calendar WHERE race_id IN (100, 101, 102)
ORDER BY race_start_date ASC;
Results
The ORDER BY
clause also supports vector searches of the vector column.
The result set is sorted using the Approximate Nearest Neighbor (ANN) algorithm with the supplied array values.
LIMIT clause
If a query returns a large number of rows, you can limit the number of rows returned, to limit the amount of data returned.
The default limit is set to 10,000 rows, the number of rows cqlsh
allows.
This examples limits the rows to 3:
SELECT * FROM cycling.comments_vs
ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 3;
Results
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+---------------------------------------------------+-----------+--------------------------------------
e8ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | rain, rain,rain, go away! | b'?fff?\\n=q=\\xf5\\xc2\\x8f=\\xcc\\xcc\\xcd?s33' | John | f25e4fe1-38c3-11ef-bd85-f92c3c7170c3
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | b'?fff?\\n=q=\\xf5\\xc2\\x8f=\\xcc\\xcc\\xcd?s33' | Alex | f259bc00-38c3-11ef-bd85-f92c3c7170c3
e8ae5df3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | Rain like a monsoon | b'?fff?\\n=q=\\xf5\\xc2\\x8f=\\xcc\\xcc\\xcd?s33' | Jane | f25eec21-38c3-11ef-bd85-f92c3c7170c3
(3 rows)
OFFSET option
The LIMIT
clause can also use the OFFSET
option to skip a number of rows before returning the results.
OFFSET
cannot be used without LIMIT
.
This is an expensive feature for large offsets and should be avoided.
Non-offset pagination is a better option for large datasets.
There are guardrail settings in the cassandra.yaml
file that will prevent offsets that are too large.
The offset_rows_warn_threshold
defaults to 10,000 rows. The offset_rows_failure_threshold
defaults to 20,000 rows.
There are a number of restrictions on the use of OFFSET
:
-
Specifying an OFFSET disables normal key-based paging.
-
Using
OFFSET 0
also disables key-based paging. -
ANN
queries don’t supportOFFSET
.
This example limits the rows to 3, but skips the first two rows:
SELECT comment,comment_vector,commenter FROM cycling.comments_vs
WHERE commenter : 'Alex'
LIMIT 3 OFFSET 2;
Results
comment | comment_vector | commenter
----------------------------------------+-------------------------------+-----------
Raining too hard should have postponed | [0.45, 0.09, 0.01, 0.2, 0.11] | Alex
(1 rows)
PER PARTITION LIMIT clause
The PER PARTITION LIMIT
option sets the maximum number of rows that the query returns from each partition.
This will only apply to tables that spread across more than one partition.
An example of such a table is defined here:
USE cycling;
CREATE TABLE rank_by_year_and_name (
race_year int,
race_name text,
cyclist_name text,
rank int,
PRIMARY KEY ((race_year, race_name), rank)
);
where the partition key is a composite of race_year
and race_name
.
The following query returns the top two cyclists from each partition stored:
SELECT rank, cyclist_name AS name FROM cycling.rank_by_year_and_name
PER PARTITION LIMIT 2;
Results
rank | name
------+----------------------
1 | Phillippe GILBERT
2 | Daniel MARTIN
1 | Daniel MARTIN
2 | Johan Esteban CHAVES
1 | Ilnur ZAKARIN
2 | Carlos BETANCUR
1 | Benjamin PRADES
2 | Adam PHELAN
(8 rows)
ALLOW FILTERING
The ALLOW FILTERING
clause allows you to perform queries that require scanning all partitions, with no primary key columns specified.
It should not be used in production as it can cause severe performance issues!
When initially modeling your data, you should avoid using ALLOW FILTERING
and instead model your data to avoid it.
However, for a small dataset or for testing purposes, it can be useful.
It may even help you identify where you need to add indexes to your data model.
For more information, see Allow Filtering explained.
The following query selects the birthday
and nationality
columns from the cyclist_alt_stats
table, with the ALLOW FILTERING
clause:
SELECT lastname, birthday, nationality FROM cycling.cyclist_alt_stats
WHERE birthday = '1991-08-25' AND nationality = 'Ethiopia'
ALLOW FILTERING;
Results
lastname | birthday | nationality
----------+------------+-------------
GRMAY | 1991-08-25 | Ethiopia
(1 rows)
Usage notes
The WHERE
clause is the key to filtering rows returned by the SELECT
statement.
If indexes exist for columns in a table, use the indexed columns in the WHERE
clause to improve query performance.
The query evaluates the WHERE
logical statements hierarchically.
First, partition keys are evaluated, followed by clustering columns, and then regular columns.
Partition keys
The query requires that all partitions are restricted, except when querying with an index. Use logic statements that identify the partition key columns with these operators:
-
equals (=): Any partition key column.
-
IN: Restricted to the last column of the partition key to search multiple partitions.
-
range (>=, ⇐, >, and <) on tokens: Fully tokenized partition key (all partition key columns specified in order as arguments of the token function). Use token ranges to scan data stored on a particular node.
See Partition keys for examples and instructions.
Clustering columns
Use logic statements that identify the clustering segment. Clustering columns set the sort order of the stored data, which is nested when there are multiple clustering columns. After evaluating the partition key, the query evaluates the clustering statements in the nested order, the first (top level), second, third, and so on. All operators are supported in logical statements if the table has only one clustering column. To efficiently locate the data within the partition for tables with multiple clustering columns, the following restrictions apply:
-
All clustering columns excluding the last clustering column:
-
Equals (=)
-
IN
-
-
Last clustering column:
-
All equality and inequality operators, and multi-column comparisons
-
Clustering column logic statements also support returning slices across clustering segments. The slice identifies the row that has the corresponding values and allows you to return all rows before, after, or between (when two slice statements are included).
See Clustering columns for examples and instructions.
Regular columns
The database allows retrieving data using a regular or static column if an index exists for the column.
Otherwise, the IN
clause works if ALLOW FILTERING
is used in conjunction.
The ALLOW FILTERING
option overrides restrictions on filtering partition columns, clustering columns, and regular columns, but can negatively impact performance, causing read latencies.
Examples
Detailed examples can be found for various filtering conditions:
Filter | Example links |
---|---|
All columns |
|
Transform columns (JSON, DISTINCT, CAST, COUNT, functions, aggregates) |
|
Specific columns |
|
Column metadata (TTL, WRITETIME) |
|
Arithmetic operators |
|
Non-arithmetic operators (IN, CONTAINS, CONTAINS KEY) |
|
Vector search with ORDER BY |
|
LIMIT and OFFSET |