SELECT
Returns data from a single table.
A SELECT statement without a WHERE clause is not recommended because all rows from all partitions are returned.
Queries that span multiple partitions can seriously impact performance.
Syntax
SELECT [ JSON | DISTINCT ] <selectors>
FROM [<keyspace_name>.]<table_name>
[ WHERE [ <primary_key_conditions> | <non_primary_key_conditions> IN (<column_name> [, ...]) ALLOW FILTERING ] [ AND ] [ <index_conditions> ]
[ GROUP BY <column_name> [ , ... ] ]
[ ORDER BY <column_name> ( ASC | DESC ) [ , ... ] ] |
[ ORDER BY <vector_column_name> ANN OF [n,n,n,...] [ LIMIT N ] ]
[ ( LIMIT <N> [ OFFSET <N> ] | PER PARTITION LIMIT <N> ) ]
[ ALLOW FILTERING ] ;
Syntax legend
| Syntax conventions | Description |
|---|---|
UPPERCASE |
Literal keyword. |
Lowercase |
Not literal. |
|
Variable value. Replace with a user-defined value. |
|
Optional.
Square brackets ( |
|
Group.
Parentheses ( |
|
Or.
A vertical bar ( |
|
Repeatable.
An ellipsis ( |
|
Single quotation ( |
|
Map collection.
Braces ( |
Set, list, map, or tuple.
Angle brackets ( |
|
|
End CQL statement.
A semicolon ( |
|
Separate the command line options from the command arguments with two hyphens ( |
|
Search CQL only: Single quotation marks ( |
|
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
Parameters
| Parameter | Description | Default |
|---|---|---|
|
Optional. Name of the keyspace that contains the table to select. |
If no name is specified, the current keyspace is used. |
|
Name of the table to select. |
|
|
Name of the column to select. |
|
|
Name of the vector column to select. |
Selectors
The SELECT statement is used to query data from a table in CQL.
The results are returned as a set of rows.
Selectors are used to specify which columns are queried and returned.
The SELECT statement can also transform the data on return using functions.
Table data also has related metadata that can be returned using metadata functions.
Return all columns
To return all columns from a table, use the wildcard * selector.
SELECT * FROM cycling.cyclist_name;
Results
id | firstname | lastname
--------------------------------------+-----------+-----------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | Alex | FRAME
fb372533-eb95-4bb4-8685-6ef61e994caa | Michael | MATTHEWS
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS
220844bf-4860-49d6-9a4b-6b5d3a79cbfb | Paolo | TIRALONGO
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | Steven | KRUIKSWIJK
e7cd5752-bc0d-4157-a80f-7523add8dbcd | Anna | VAN DER BREGGEN
(6 rows)
id | firstname | lastname
--------------------------------------+-----------+-----------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | Alex | FRAME
fb372533-eb95-4bb4-8685-6ef61e994caa | Michael | MATTHEWS
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS
220844bf-4860-49d6-9a4b-6b5d3a79cbfb | Paolo | TIRALONGO
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | Steven | KRUIKSWIJK
e7cd5752-bc0d-4157-a80f-7523add8dbcd | Anna | VAN DER BREGGEN
(6 rows)
Since the * selector returns all columns, it is not necessary to specify the column names.
The more columns a table has, the more resources are required to return the data.
Return specific columns
Because the number of columns returned can affect performance, it is recommended to specify the columns to return, if possible.
For example, to return only the lastname column from the cycling.cyclist_name table:
SELECT lastname FROM cycling.cyclist_name;
Results
lastname
-----------------
FRAME
MATTHEWS
VOS
TIRALONGO
KRUIKSWIJK
VAN DER BREGGEN
(6 rows)
lastname
-----------------
FRAME
MATTHEWS
VOS
TIRALONGO
KRUIKSWIJK
VAN DER BREGGEN
(6 rows)
To return multiple columns, separate the column names with a comma:
SELECT lastname, firstname FROM cycling.cyclist_name;
Results
lastname | firstname
-----------------+-----------
FRAME | Alex
MATTHEWS | Michael
VOS | Marianne
TIRALONGO | Paolo
KRUIKSWIJK | Steven
VAN DER BREGGEN | Anna
(6 rows)
lastname | firstname
-----------------+-----------
FRAME | Alex
MATTHEWS | Michael
VOS | Marianne
TIRALONGO | Paolo
KRUIKSWIJK | Steven
VAN DER BREGGEN | Anna
(6 rows)
Columns can be renamed on return using an alias:
SELECT id, cyclist_age AS age FROM cycling.cyclist_alt_stats LIMIT 3;
Results
id | age
--------------------------------------+------
e0953617-07eb-4c82-8f91-3b2757981625 | null
1ba0417d-62da-4103-b710-de6fb222db6f | 34
4ceb495c-55ab-4f71-83b9-81117252bf13 | 34
(3 rows)
|
A column alias cannot be used in the |
Return transformed data
The SELECT statement can transform the data on return using functions.
The functions are varied and can be used to manipulate the data in different ways.
For example, JSON can be used to return all columns as JSON, toJson to return selected column as JSON, or CAST can be used to change the data type of a column.
Distinct column values can be returned using the DISTINCT function.
The number of rows with a particular column can be returned using the COUNT function.
Any native or user-defined functions can be used to transform the data, such as the SUM or AVG function.
Any user-defined aggregate function can be used to transform the data, such as the average function.
The TOKEN function can be used to return the token value of a column.
To output selected data from a table in JSON format, use the JSON function:
SELECT JSON * FROM cycling.birthday_list;
Results
[json]
----------------------------------------------------------------------------------------------------------
{"cyclist_name": "Claudio HEINEN", "blist": {"age": "23", "bday": "27/07/1992", "nation": "GERMANY"}}
{"cyclist_name": "Claudio VANDELLI", "blist": {"age": "54", "bday": "27/07/1961", "nation": "ITALY"}}
{"cyclist_name": "Luc HAGENAARS", "blist": {"age": "28", "bday": "27/07/1987", "nation": "NETHERLANDS"}}
{"cyclist_name": "Toine POELS", "blist": {"age": "52", "bday": "27/07/1963", "nation": "NETHERLANDS"}}
{"cyclist_name": "Allan DAVIS", "blist": {"age": "35", "bday": "27/07/1980", "nation": "AUSTRALIA"}}
{"cyclist_name": "Laurence BOURQUE", "blist": {"age": "23", "bday": "27/07/1992", "nation": "CANADA"}}
(6 rows)
Note the nested JSON structure in the output, with the column name as the key and the column value as the value.
To specify the JSON format for a particular column, enclose its name in the toJson() function:
SELECT id, commenter, toJson(created_at) FROM cycling.comments_vs;
Results
id | commenter | system.tojson(created_at)
----+-----------+---------------------------
(0 rows)
Note that the JSON-formatted timestamp column can include complete time zone information.
To cast a column to a different data type, use the CAST function:
SELECT CAST(created_at AS date) FROM cycling.comments;
Results
cast(created_at as date)
--------------------------
2023-04-01
2024-06-07
(2 rows)
If a table has duplicate values in a column, use the DISTINCT function to return only the unique values:
SELECT DISTINCT country FROM cycling.country_flag;
Results
country
---------
Belgium
France
(2 rows)
You may want to count the number of rows in a column.
Use the COUNT function to return the count:
SELECT start_month, MAX(start_day), COUNT(*) FROM cycling.events
WHERE year = 2017 AND discipline = 'Cyclo-cross';
Results
start_month | system.max(start_day) | count
-------------+-----------------------+-------
null | null | 0
(1 rows)
start_month | system.max(start_day) | count
-------------+-----------------------+-------
1 | 28 | 11
(1 rows)
To use the COUNT function with a WHERE clause, include the COUNT function in the SELECT list and the WHERE clause in the query.
For example, to get the number of cyclists from Belgium:
SELECT count(cyclist_name) FROM cycling.country_flag
WHERE country='Belgium';
Results
system.count(cyclist_name)
----------------------------
2
(1 rows)
A number of native functions can be used to transform the data.
The data is transformed only on return, rather than stored in the table.
These are general functions that are familiar to most people, such as the average, or AVG function:
SELECT AVG(race_points) FROM cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
Results
system.avg(race_points)
-------------------------
67
(1 rows)
User-defined functions can also be used to transform the data. For examples, the 'left` function can be used to return a column that is left-justified:
SELECT left(firstname, 1), lastname FROM cycling.cyclist_name;
Results
User-defined aggregates can also be used to transform the data.
For examples, the 'average` function can be used to retrieve the average of the column cyclist_time_sec from a table:
SELECT average(cyclist_time_sec) AS Average FROM cycling.team_average
WHERE team_name = 'UnitedHealthCare Pro Cycling Womens Team';
Results
The TOKEN function can be used to return the token value of a column.
The token value is the hashed value of the partition key, thus it can only be used on the partition key column or columns.
This function is useful when using the token value in a query to filter data with a WHERE clause.
To return the token value of the race_id column in the cycling.race table:
SELECT race_name AS name,race_year AS year FROM cycling.rank_by_year_and_name
WHERE TOKEN(race_year,race_name) >= 4582455970709790046;
Results
name | year
--------------------------------------------+------
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
(3 rows)
name | year
--------------------------------------------+------
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
Tour of Japan - Stage 4 - Minami > Shinshu | 2015
(3 rows)
Return column metadata
Each non-primary key column in a table has metadata associated with it.
The WRITETIME function returns the timestamp of the last write to a column.
The TTL function returns the time to live of a column.
It is important to note that the metadata functions are not available for all columns.
For example, the metadata functions are not available for columns with a data type of counter or primary key columns.
Additionally, the metadata functions are not available for columns that are part of a collection or a user-defined type that is not frozen.
To return the writetime of the firstname column in the cycling.cyclist_points table:
SELECT WRITETIME (firstname) FROM cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
Results
writetime(firstname)
----------------------
1718927710224973
1718927710227830
1718927710231004
(3 rows)
The TTL function returns the time to live of a column.
This function is useful when using Time to Live (TTL) to expire data in a table.
If a TTL is set on a column, the data is automatically deleted after the specified time has elapsed.
To return the TTL of the race_points column in the cycling.cyclist_points table:
SELECT TTL(race_name) FROM cycling.calendar WHERE race_id = 200;
Results
ttl(race_name)
----------------
199
(1 rows)
ttl(race_name)
----------------
200
(1 rows)
Options
GROUP BY clause
Group by one or more columns.
Condenses the selected rows that share the same values for a set of columns or values returned by a function into a group.
Either one or more primary key columns or a deterministic function or aggregate can be used in the GROUP BY clause.
SELECT race_date, race_time FROM cycling.race_times_summary
GROUP BY race_date;
Results
race_date | race_time
------------+--------------------
2019-03-21 | 10:01:18.000000000
2018-07-26 | 10:01:18.000000000
2017-04-14 | 10:01:18.000000000
(3 rows)
Warnings :
Aggregation query used without partition key
Each set of rows with the same race_date column value are grouped together into one row in the query output.
Three rows are returned because there are three groups of rows with the same race_date column value.
The value returned is the first value that is found for the group.
This query groups the rows by race_date and FLOOR(race_time, 1h), which returns the hour.
The number of rows in each group is returned by COUNT(*).
SELECT race_date, FLOOR(race_time, 1h), COUNT(*) FROM cycling.race_times_summary
GROUP BY race_date, FLOOR(race_time, 1h);
Results
race_date | system.floor(race_time, 1h) | count
------------+-----------------------------+-------
2019-03-21 | 10:00:00.000000000 | 2
2019-03-21 | 11:00:00.000000000 | 1
2019-03-21 | 12:00:00.000000000 | 1
2018-07-26 | 10:00:00.000000000 | 2
2018-07-26 | 11:00:00.000000000 | 1
2018-07-26 | 12:00:00.000000000 | 1
2017-04-14 | 10:00:00.000000000 | 2
2017-04-14 | 11:00:00.000000000 | 1
2017-04-14 | 12:00:00.000000000 | 1
(9 rows)
Warnings :
Aggregation query used without partition key
Nine rows are returned because there are nine groups of rows with the same race_date and FLOOR(race_time, 1h) values.
ORDER BY clause
You can fine-tune the display order using the ORDER BY clause.
The partition key must be defined in the WHERE clause and then the ORDER BY clause defines one or more clustering columns to use for ordering.
The order of the specified columns must match the order of the clustering columns in the PRIMARY KEY definition.
The options for ordering are ASC (ascending) and DESC (descending).
If no order is specified, the results are returned in the stored order.
|
Note that using both |
SELECT * FROM cycling.calendar WHERE race_id IN (100, 101, 102)
ORDER BY race_start_date ASC;
Results
The ORDER BY clause also supports vector searches of the vector column.
The result set is sorted using the Approximate Nearest Neighbor (ANN) algorithm with the supplied array values.
LIMIT clause
If a query returns a large number of rows, you can limit the number of rows returned, to limit the amount of data returned.
The default limit is set to 10,000 rows, the number of rows cqlsh allows.
This examples limits the rows to 3:
SELECT * FROM cycling.comments_vs
ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 3;
Results
PER PARTITION LIMIT clause
The PER PARTITION LIMIT option sets the maximum number of rows that the query returns from each partition.
This will only apply to tables that spread across more than one partition.
An example of such a table is defined here:
USE cycling;
CREATE TABLE rank_by_year_and_name (
race_year int,
race_name text,
cyclist_name text,
rank int,
PRIMARY KEY ((race_year, race_name), rank)
);
where the partition key is a composite of race_year and race_name.
The following query returns the top two cyclists from each partition stored:
SELECT rank, cyclist_name AS name FROM cycling.rank_by_year_and_name
PER PARTITION LIMIT 2;
Results
rank | name
------+----------------------
1 | Phillippe GILBERT
2 | Daniel MARTIN
1 | Daniel MARTIN
2 | Johan Esteban CHAVES
1 | Ilnur ZAKARIN
2 | Carlos BETANCUR
1 | Benjamin PRADES
2 | Adam PHELAN
(8 rows)
rank | name
------+----------------------
1 | Phillippe GILBERT
2 | Daniel MARTIN
1 | Daniel MARTIN
2 | Johan Esteban CHAVES
1 | Ilnur ZAKARIN
2 | Carlos BETANCUR
1 | Benjamin PRADES
2 | Adam PHELAN
(8 rows)
ALLOW FILTERING
|
Don’t use |
The ALLOW FILTERING clause allows you to perform queries that require scanning all partitions, with no primary key columns specified.
These queries can be extremely long running and resource intensive, resulting in severe performance issues.
During data modeling, avoid queries that require ALLOW FILTERING.
Instead, model your data and queries to avoid it.
For small datasets or testing purposes on small datasets, it can be useful, and it could help you identify where you need to add indexes to your data model.
The following query selects the birthday and nationality columns from the cyclist_alt_stats table, with the ALLOW FILTERING clause:
SELECT lastname, birthday, nationality FROM cycling.cyclist_alt_stats
WHERE birthday = '1991-08-25' AND nationality = 'Ethiopia'
ALLOW FILTERING;
Results
lastname | birthday | nationality
----------+------------+-------------
GRMAY | 1991-08-25 | Ethiopia
(1 rows)
Usage notes
The WHERE clause is the key to filtering rows returned by the SELECT statement.
If indexes exist for columns in a table, use the indexed columns in the WHERE clause to improve query performance.
The query evaluates the WHERE logical statements hierarchically.
First, partition keys are evaluated, followed by clustering columns, and then regular columns.
Partition keys
The query requires that all partitions are restricted, except when querying with an index. Use logic statements that identify the partition key columns with these operators:
-
equals (=): Any partition key column.
-
IN: Restricted to the last column of the partition key to search multiple partitions.
-
range (>=, ⇐, >, and <) on tokens: Fully tokenized partition key (all partition key columns specified in order as arguments of the token function). Use token ranges to scan data stored on a particular node.
See Partition keys for examples and instructions.
Clustering columns
Use logic statements that identify the clustering segment. Clustering columns set the sort order of the stored data, which is nested when there are multiple clustering columns. After evaluating the partition key, the query evaluates the clustering statements in the nested order, the first (top level), second, third, and so on. All operators are supported in logical statements if the table has only one clustering column. To efficiently locate the data within the partition for tables with multiple clustering columns, the following restrictions apply:
-
All clustering columns excluding the last clustering column:
-
Equals (=)
-
IN
-
-
Last clustering column:
-
All equality and inequality operators, and multi-column comparisons
-
Clustering column logic statements also support returning slices across clustering segments. The slice identifies the row that has the corresponding values and allows you to return all rows before, after, or between (when two slice statements are included).
See Clustering columns for examples and instructions.
Regular columns
The database allows retrieving data using a regular or static column if an index exists for the column.
Otherwise, the IN clause works if ALLOW FILTERING is used in conjunction.
The ALLOW FILTERING option overrides restrictions on filtering partition columns, clustering columns, and regular columns, but can negatively impact performance, causing read latencies.
Examples
Detailed examples can be found for various filtering conditions:
| Filter | Example links |
|---|---|
All columns |
|
Transform columns (JSON, DISTINCT, CAST, COUNT, functions, aggregates) |
|
Specific columns |
|
Column metadata (TTL, WRITETIME) |
|
Arithmetic operators |
|
Non-arithmetic operators (IN, CONTAINS, CONTAINS KEY) |
|
Vector search with ORDER BY |