SELECT

Returns data from a single table. A SELECT statement without a WHERE clause is not recommended because all rows from all partitions are returned. Queries that span multiple partitions can seriously impact performance.

Syntax

SELECT [ JSON | DISTINCT ] <selectors>
  FROM [<keyspace_name>.]<table_name>
  [ WHERE [ <primary_key_conditions> | <non_primary_key_conditions> IN (<column_name> [, ...]) ALLOW FILTERING ] [ AND ] [ <index_conditions> ]
  [ GROUP BY <column_name> [ , ... ] ]
  [ ORDER BY <column_name> ( ASC | DESC ) [ , ... ] ] |
  [ ORDER BY <vector_column_name> ANN OF [n,n,n,...] [ LIMIT N ] ]
  [ ( LIMIT <N> [ OFFSET <N> ]  | PER PARTITION LIMIT <N> ) ]
  [ ALLOW FILTERING ] ;

Legend
Syntax conventions	Description
UPPERCASE	Literal keyword.
Lowercase	Not literal.
`< >`	Variable value. Replace with a user-defined value.
`[]`	Optional. Square brackets (`[]`) surround optional command arguments. Do not type the square brackets.
`( )`	Group. Parentheses ( `( )` ) identify a group to choose from. Do not type the parentheses.
`\|`	Or. A vertical bar (`\|`) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
`...`	Repeatable. An ellipsis ( `...` ) indicates that you can repeat the syntax element as often as required.
`'<Literal string>'`	Single quotation (`'`) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
`{ <key> : <value> }`	Map collection. Braces (`{ }`) enclose map collections or key value pairs. A colon separates the key and the value.
`<<datatype1>,<datatype2>>`	Set, list, map, or tuple. Angle bracke ts ( `< >` ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
`<cql_statement>;`	End CQL statement. A semicolon (`;`) terminates all CQL statements.
`[--]`	Separate the command line options from the command arguments with two hyphens ( `--` ). This syntax is useful when arguments might be mistaken for command line options.
`' <<schema\> ... </schema\>> '`	Search CQL only: Single quotation marks (`'`) surround an entire XML schema declaration.
`@<xml_entity>='<xml_entity_type>'`	Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.

Parameters

Parameter Description Default

Parameter	Description	Default
`keyspace_name`	Optional. Name of the keyspace that contains the table to select.	If no name is specified, the current keyspace is used.
`table_name`	Name of the table to select.
`column_name`	Name of the column to select.
`vector_column_name`	Name of the vector column to select.

keyspace_name

Optional. Name of the keyspace that contains the table to select.

If no name is specified, the current keyspace is used.

table_name

Name of the table to select.

column_name

Name of the column to select.

vector_column_name

Name of the vector column to select.

Selectors

The SELECT statement is used to query data from a table in CQL. The results are returned as a set of rows. Selectors are used to specify which columns are queried and returned. The SELECT statement can also transform the data on return using functions. Table data also has related metadata that can be returned using metadata functions.

Return all columns

To return all columns from a table, use the wildcard * selector.

SELECT * FROM cycling.cyclist_name;

Result

 id                                   | firstname | lastname
--------------------------------------+-----------+-----------------
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |      Alex |           FRAME
 fb372533-eb95-4bb4-8685-6ef61e994caa |   Michael |        MATTHEWS
 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 |  Marianne |             VOS
 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |     Paolo |       TIRALONGO
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |    Steven |      KRUIKSWIJK
 e7cd5752-bc0d-4157-a80f-7523add8dbcd |      Anna | VAN DER BREGGEN

(6 rows)

Since the * selector returns all columns, it is not necessary to specify the column names. The more columns a table has, the more resources are required to return the data.

Return specific columns

Because the number of columns returned can affect performance, it is recommended to specify the columns to return, if possible.

For example, to return only the lastname column from the cycling.cyclist_name table:

SELECT lastname FROM cycling.cyclist_name;

Result

 lastname
-----------------
           FRAME
        MATTHEWS
             VOS
       TIRALONGO
      KRUIKSWIJK
 VAN DER BREGGEN

(6 rows)

To return multiple columns, separate the column names with a comma:

SELECT lastname, firstname FROM cycling.cyclist_name;

Result

 lastname        | firstname
-----------------+-----------
           FRAME |      Alex
        MATTHEWS |   Michael
             VOS |  Marianne
       TIRALONGO |     Paolo
      KRUIKSWIJK |    Steven
 VAN DER BREGGEN |      Anna

(6 rows)

Columns can be renamed on return using an alias:

SELECT id, cyclist_age AS age FROM cycling.cyclist_alt_stats LIMIT 3;

Result

 id                                   | age
--------------------------------------+------
 e0953617-07eb-4c82-8f91-3b2757981625 | null
 1ba0417d-62da-4103-b710-de6fb222db6f |   34
 4ceb495c-55ab-4f71-83b9-81117252bf13 |   34

(3 rows)

A column alias cannot be used in the WHERE clause; use the original column name.

Return transformed data

The SELECT statement can transform the data on return using functions. The functions are varied and can be used to manipulate the data in different ways. For example, JSON can be used to return all columns as JSON, toJson to return selected column as JSON, or CAST can be used to change the data type of a column. Distinct column values can be returned using the DISTINCT function. The number of rows with a particular column can be returned using the COUNT function. Any native or user-defined functions can be used to transform the data, such as the SUM or AVG function. Any user-defined aggregate function can be used to transform the data, such as the average function. The TOKEN function can be used to return the token value of a column.

To output selected data from a table in JSON format, use the JSON function:

SELECT JSON * FROM cycling.birthday_list;

Result

 [json]
----------------------------------------------------------------------------------------------------------
    {"cyclist_name": "Claudio HEINEN", "blist": {"age": "23", "bday": "27/07/1992", "nation": "GERMANY"}}
    {"cyclist_name": "Claudio VANDELLI", "blist": {"age": "54", "bday": "27/07/1961", "nation": "ITALY"}}
 {"cyclist_name": "Luc HAGENAARS", "blist": {"age": "28", "bday": "27/07/1987", "nation": "NETHERLANDS"}}
   {"cyclist_name": "Toine POELS", "blist": {"age": "52", "bday": "27/07/1963", "nation": "NETHERLANDS"}}
     {"cyclist_name": "Allan DAVIS", "blist": {"age": "35", "bday": "27/07/1980", "nation": "AUSTRALIA"}}
   {"cyclist_name": "Laurence BOURQUE", "blist": {"age": "23", "bday": "27/07/1992", "nation": "CANADA"}}

(6 rows)

Note the nested JSON structure in the output, with the column name as the key and the column value as the value.

To specify the JSON format for a particular column, enclose its name in the toJson() function. A JSON-formatted timestamp column can include complete time zone information.

SELECT id, commenter, toJson(created_at) FROM cycling.comments_vs;

Result

 id                                   | commenter | system.tojson(created_at)
--------------------------------------+-----------+----------------------------
 e8ae5cf3-d358-4d99-b900-85902fda9bb0 |      John | "2017-04-01 14:33:02.160Z"
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |      Alex | "2017-04-01 14:33:02.160Z"
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |      Alex | "2017-03-21 21:11:09.999Z"
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |      Alex | "2017-02-14 20:43:20.000Z"
 c7fceba0-c141-4207-9494-a29f9809de6f |       Amy | "2024-06-07 03:04:35.620Z"
 c7fceba0-c141-4207-9494-a29f9809de6f |       Amy | "2017-04-01 13:43:08.030Z"
 c7fceba0-c141-4207-9494-a29f9809de6f |       Amy | "2017-03-22 01:16:59.001Z"
 c7fceba0-c141-4207-9494-a29f9809de6f |       Amy | "2017-02-17 08:43:20.234Z"
 c7fceba0-c141-3207-9494-a29f9809de6f |       Amy | "2017-02-17 08:43:20.234Z"
 c7fceba0-c141-7207-9494-a29f9809de6f |       Amy | "2024-06-07 03:04:35.624Z"
 e8ae5df3-d358-4d99-b900-85902fda9bb0 |      Jane | "2017-04-01 14:33:02.160Z"
 c7fceba0-c141-4207-9594-a29f9809de6f |      Jane | "2017-02-17 08:43:20.234Z"

(12 rows)

To cast a column to a different data type, use the CAST function:

SELECT CAST(created_at AS date) FROM cycling.comments;

Result

 cast(created_at as date)
--------------------------
               2023-04-01
               2024-06-07

(2 rows)

If a table has duplicate values in a column, use the DISTINCT function to return only the unique values:

SELECT DISTINCT country FROM cycling.country_flag;

Result

 country
---------
 Belgium
  France

(2 rows)

To count the number of rows in a column, use the COUNT function:

SELECT start_month, MAX(start_day), COUNT(*) FROM cycling.events
  WHERE year = 2017 AND discipline = 'Cyclo-cross';

Result

 start_month | system.max(start_day) | count
-------------+-----------------------+-------
           1 |                    28 |    11

(1 rows)

To use the COUNT function with a WHERE clause, include the COUNT function in the SELECT list and the WHERE clause in the query. For example, to get the number of cyclists from Belgium:

SELECT count(cyclist_name) FROM cycling.country_flag
  WHERE country='Belgium';

Result

 system.count(cyclist_name)
----------------------------
                          2

(1 rows)

A number of native functions can be used to transform the data. The data is transformed only on return, rather than stored in the table. These are general functions that are familiar to most people, such as the average, or AVG function:

SELECT AVG(race_points) FROM cycling.cyclist_points
  WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;

Result

 system.avg(race_points)
-------------------------
                      67

(1 rows)

User-defined functions can also be used to transform the data:

SELECT left(firstname, 1), lastname FROM cycling.cyclist_name;

Result

 cycling.left(firstname, 1) | lastname
----------------------------+-----------------
                          A |           FRAME
                       null |        MATTHEWS
                       null |             VOS
                          P |       TIRALONGO
                          M |             VOS
                          S |      KRUIKSWIJK
                          A | VAN DER BREGGEN

(7 rows)

User-defined aggregates can also be used to transform the data. For examples, the 'average` function can be used to retrieve the average of the column cyclist_time_sec from a table:

SELECT average(cyclist_time_sec) AS Average FROM cycling.team_average
  WHERE team_name = 'UnitedHealthCare Pro Cycling Womens Team';

Result

 average
-------------
 11474.66667

(1 rows)

Warnings :
User defined function cycling.average_state : (tuple<int, bigint>, int) -> tuple<int, bigint> consumed more than 500µs CPU time (4183µs)

The TOKEN function can be used to return the token value of a column. The token value is the hashed value of the partition key, thus it can only be used on the partition key column or columns. This function is useful when using the token value in a query to filter data with a WHERE clause.

To return the token value of the race_id column in the cycling.race table:

SELECT race_name AS name,race_year AS year FROM cycling.rank_by_year_and_name
  WHERE TOKEN(race_year,race_name) >= 4582455970709790046;

Result

 name                                       | year
--------------------------------------------+------
 Tour of Japan - Stage 4 - Minami > Shinshu | 2015
 Tour of Japan - Stage 4 - Minami > Shinshu | 2015
 Tour of Japan - Stage 4 - Minami > Shinshu | 2015

(3 rows)

Return column metadata

Each non-primary key column in a table has metadata associated with it. The WRITETIME function returns the timestamp of the last write to a column. The TTL function returns the time to live of a column.

It is important to note that the metadata functions are not available for all columns. For example, the metadata functions are not available for columns with a data type of counter or primary key columns. Additionally, the metadata functions are not available for columns that are part of a collection or a user-defined type that is not frozen.

To return the writetime of the firstname column in the cycling.cyclist_points table:

SELECT WRITETIME(firstname) FROM cycling.cyclist_points
  WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;

Result

 writetime(firstname)
----------------------
     1719860014208536
     1719860014213440
     1719860014216346

(3 rows)

The TTL function returns the time to live of a column. This function is useful when using Time to Live (TTL) to expire data in a table. If a TTL is set on a column, the data is automatically deleted after the specified time has elapsed.

To return the TTL of the race_points column in the cycling.cyclist_points table:

SELECT TTL(race_name) FROM cycling.calendar WHERE race_id = 200;

Result

 ttl(race_name)
----------------
            199

(1 rows)

Options

GROUP BY clause

Group by one or more columns. Condenses the selected rows that share the same values for a set of columns or values returned by a function into a group. Either one or more primary key columns or a deterministic function or aggregate can be used in the GROUP BY clause.

SELECT race_date, race_time FROM cycling.race_times_summary
  GROUP BY race_date;

Each set of rows with the same race_date column value are grouped together into one row in the query output. Three rows are returned because there are three groups of rows with the same race_date column value. The value returned is the first value that is found for the group.

Result

 race_date  | race_time
------------+--------------------
 2019-03-21 | 10:01:18.000000000
 2018-07-26 | 10:01:18.000000000
 2017-04-14 | 10:01:18.000000000

(3 rows)

Warnings :
Aggregation query used without partition key

This query groups the rows by race_date and FLOOR(race_time, 1h), which returns the hour. The number of rows in each group is returned by COUNT(*).

SELECT race_date, FLOOR(race_time, 1h), COUNT(*) FROM cycling.race_times_summary
  GROUP BY race_date, FLOOR(race_time, 1h);

Nine rows are returned because there are nine groups of rows with the same race_date and FLOOR(race_time, 1h) values:

Result

 race_date  | system.floor(race_time, 1h) | count
------------+-----------------------------+-------
 2019-03-21 |          10:00:00.000000000 |     2
 2019-03-21 |          11:00:00.000000000 |     1
 2019-03-21 |          12:00:00.000000000 |     1
 2018-07-26 |          10:00:00.000000000 |     2
 2018-07-26 |          11:00:00.000000000 |     1
 2018-07-26 |          12:00:00.000000000 |     1
 2017-04-14 |          10:00:00.000000000 |     2
 2017-04-14 |          11:00:00.000000000 |     1
 2017-04-14 |          12:00:00.000000000 |     1

(9 rows)

Warnings :
Aggregation query used without partition key

ORDER BY clause

You can fine-tune the display order using the ORDER BY clause. The partition key must be defined in the WHERE clause and then the ORDER BY clause defines one or more clustering columns to use for ordering. The order of the specified columns must match the order of the clustering columns in the PRIMARY KEY definition. The options for ordering are ASC (ascending) and DESC (descending).

If no order is specified, the results are returned in the stored order.

Note that using both IN and ORDER BY require turning off paging with the PAGING OFF command in cqlsh.

SELECT * FROM cycling.calendar WHERE race_id IN (100, 101, 102)
ORDER BY race_start_date ASC;

Result

 race_id | race_start_date                 | race_end_date                   | race_name
---------+---------------------------------+---------------------------------+-----------------------
     100 | 2015-05-09 00:00:00.000000+0000 | 2015-05-31 00:00:00.000000+0000 |         Giro d'Italia
     100 | 2014-05-08 00:00:00.000000+0000 | 2014-05-30 00:00:00.000000+0000 |         Giro d'Italia
     100 | 2013-05-07 00:00:00.000000+0000 | 2014-05-29 00:00:00.000000+0000 |         Giro d'Italia
     102 | 2015-06-13 00:00:00.000000+0000 | 2015-06-21 00:00:00.000000+0000 |        Tour de Suisse
     102 | 2014-06-12 00:00:00.000000+0000 | 2014-06-20 00:00:00.000000+0000 |        Tour de Suisse
     102 | 2013-06-11 00:00:00.000000+0000 | 2013-06-19 00:00:00.000000+0000 |        Tour de Suisse
     101 | 2015-06-07 00:00:00.000000+0000 | 2015-06-14 00:00:00.000000+0000 | Criterium du Dauphine
     101 | 2014-06-06 00:00:00.000000+0000 | 2014-06-13 00:00:00.000000+0000 | Criterium du Dauphine
     101 | 2013-06-05 00:00:00.000000+0000 | 2013-06-12 00:00:00.000000+0000 | Criterium du Dauphine
     103 | 2015-07-04 00:00:00.000000+0000 | 2015-07-26 00:00:00.000000+0000 |        Tour de France
     103 | 2014-07-03 00:00:00.000000+0000 | 2014-07-25 00:00:00.000000+0000 |        Tour de France
     103 | 2013-07-02 00:00:00.000000+0000 | 2013-07-24 00:00:00.000000+0000 |        Tour de France

(12 rows)

The ORDER BY clause also supports vector searches of the vector column. The result set is sorted using the approximate nearest neighbor (ANN) algorithm with the supplied array values.

LIMIT clause

If a query returns a large number of rows, you can limit the number of rows returned, to limit the amount of data returned. The default limit is set to 10,000 rows, the number of rows cqlsh allows. This examples limits the rows to 1:

SELECT created_at FROM cycling.comments
  WHERE id = e7ae5cf3-d358-4d99-b900-85902fda9bb0
LIMIT 1;

Result

 created_at
---------------------------------
 2024-07-02 00:00:00.000000+0000

(1 rows)

PER PARTITION LIMIT clause

The PER PARTITION LIMIT option sets the maximum number of rows that the query returns from each partition. This will only apply to tables that spread across more than one partition. An example of such a table is defined here:

USE cycling;
CREATE TABLE rank_by_year_and_name (
  race_year int,
  race_name text,
  cyclist_name text,
  rank int,
  PRIMARY KEY ((race_year, race_name), rank)
);

where the partition key is a composite of race_year and race_name.

The following query returns the top two cyclists from each partition stored:

SELECT rank, cyclist_name AS name FROM cycling.rank_by_year_and_name
  PER PARTITION LIMIT 2;

Result

 rank | name
------+----------------------
    1 |    Phillippe GILBERT
    2 |        Daniel MARTIN
    1 |        Daniel MARTIN
    2 | Johan Esteban CHAVES
    1 |        Ilnur ZAKARIN
    2 |      Carlos BETANCUR
    1 |      Benjamin PRADES
    2 |          Adam PHELAN

(8 rows)

ALLOW FILTERING

Unless you have an explicit, tested use case, don’t use ALLOW FILTERING in production because it can cause severe performance issues.

The ALLOW FILTERING clause allows you to perform queries that require scanning all partitions, with no primary key columns specified. These queries can be extremely long running and resource intensive, resulting in severe performance issues.

During data modeling, avoid queries that require ALLOW FILTERING. Instead, model your data and queries to avoid it.

For small datasets or testing purposes on small datasets, it can be useful, and it could help you identify where you need to add indexes to your data model.

The following query selects the birthday and nationality columns from the cyclist_alt_stats table, with the ALLOW FILTERING clause:

SELECT lastname, birthday, nationality FROM cycling.cyclist_alt_stats
  WHERE birthday = '1991-08-25' AND nationality = 'Ethiopia'
ALLOW FILTERING;

Result

 lastname | birthday   | nationality
----------+------------+-------------
    GRMAY | 1991-08-25 |    Ethiopia

(1 rows)

Usage notes

The WHERE clause is the key to filtering rows returned by the SELECT statement. If indexes exist for columns in a table, use the indexed columns in the WHERE clause to improve query performance.

The query evaluates the WHERE logical statements hierarchically. First, partition keys are evaluated, followed by clustering columns, and then regular columns.

Partition keys

The query requires that all partitions are restricted, except when querying with an index. Use logic statements that identify the partition key columns with these operators:

equals (=): Any partition key column.
IN: Restricted to the last column of the partition key to search multiple partitions.
range (>=, ⇐, >, and <) on tokens: Fully tokenized partition key (all partition key columns specified in order as arguments of the token function). Use token ranges to scan data stored on a particular node.

See Partition keys for examples and instructions.

Clustering columns

Use logic statements that identify the clustering segment. Clustering columns set the sort order of the stored data, which is nested when there are multiple clustering columns. After evaluating the partition key, the query evaluates the clustering statements in the nested order, the first (top level), second, third, and so on. All operators are supported in logical statements if the table has only one clustering column. To efficiently locate the data within the partition for tables with multiple clustering columns, the following restrictions apply:

All clustering columns excluding the last clustering column:
- Equals (=)
- IN
Last clustering column:
- All equality and inequality operators, and multi-column comparisons

Clustering column logic statements also support returning slices across clustering segments. The slice identifies the row that has the corresponding values and allows you to return all rows before, after, or between (when two slice statements are included).

See Clustering columns for examples and instructions.

Regular columns

The database allows retrieving data using a regular or static column if an index exists for the column. Otherwise, the IN clause works if ALLOW FILTERING is used in conjunction.

The ALLOW FILTERING option overrides restrictions on filtering partition columns, clustering columns, and regular columns, but can negatively impact performance, causing read latencies.

Examples

Detailed examples can be found for various filtering conditions:

Filter	Example links
All columns	All columns
Transform columns (JSON, DISTINCT, CAST, COUNT, functions, aggregates)	Transform columns
Specific columns	Specific columns
Column metadata (TTL, WRITETIME)	Column metadata
Arithmetic operators	Arithmetic operators
Non-arithmetic operators (IN, CONTAINS, CONTAINS KEY)	Non-arithmetic operators
Vector search with ORDER BY	Vector search

Filter

Example links

All columns

Transform columns (JSON, DISTINCT, CAST, COUNT, functions, aggregates)

Transform columns

Specific columns

Column metadata (TTL, WRITETIME)

Column metadata

Arithmetic operators

Non-arithmetic operators (IN, CONTAINS, CONTAINS KEY)

Non-arithmetic operators

Vector search with ORDER BY

Vector search

SELECT

Syntax

Parameters

Selectors

Return all columns

Return specific columns

Return transformed data

Return column metadata

Options

GROUP BY clause

ORDER BY clause

LIMIT clause

PER PARTITION LIMIT clause

ALLOW FILTERING

Usage notes

Partition keys

Clustering columns

Regular columns

Examples

See also

Was this helpful?

Give Feedback