CQL quick reference
Provides a consolidated syntax list of Cassandra Query Language (CQL) commands for quick reference.
Provides a consolidated syntax list of Cassandra Query Language (CQL) commands for quick reference.
See this quick reference guide for other CQL versions: 5.1 | 6.0 | 6.7.
Download a printable CQL reference with the ten most frequently use CQL commands and a list of the CQL data types.
ALTER KEYSPACE
ALTER KEYSPACE keyspace_name
WITH REPLICATION = {replication_map}
[AND DURABLE_WRITES = true|false] ;
Learn more.
Example
Change the cycling
keyspace to NetworkTopologyStrategy
in
a single datacenter and turn off durable writes (not recommended). This example uses the
default datacenter name with a replication factor of 3.
ALTER KEYSPACE cycling WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 } AND DURABLE_WRITES = false ;
ALTER MATERIALIZED VIEW
ALTER MATERIALIZED VIEW [keyspace_name.]view_name WITH table_options [ AND table_options ... ] ;Learn more.
Examples
Modifying table properties
For an overview of properties that apply to materialized views, see table_options .
ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH comment = 'A most excellent and useful view' AND bloom_filter_fp_chance = 0.02;
Modifying compression and compaction
Use a property map to specify new properties for compression or compaction.
ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 } AND compaction = { 'class': 'SizeTieredCompactionStrategy', 'max_threshold': 64};
Changing caching
You can create and change caching properties using a property map.
This example changes the keys property to NONE
(the
default is ALL
) and changes the rows_per_partition
property to 15
.
ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH caching = { 'keys' : 'NONE', 'rows_per_partition' : '15' };
Viewing current materialized view properties
Use DESCRIBE MATERIALIZED VIEW to see all current properties.
DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_age
CREATE MATERIALIZED VIEW cycling.cyclist_by_age AS
SELECT age, cid, birthday, country, name
FROM cycling.cyclist_mv
WHERE age IS NOT NULL AND cid IS NOT NULL
PRIMARY KEY (age, cid)
WITH CLUSTERING ORDER BY (cid ASC)
AND bloom_filter_fp_chance = 0.02
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'A most excellent and useful view'
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.DeflateCompressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
ALTER ROLE
ALTER ROLE role_name
[ WITH [ PASSWORD = 'password'
| LOGIN = (true | false)
| SUPERUSER = (true | false)
| OPTIONS = option_map ]
Learn more.
Example
ALTER ROLE coach WITH PASSWORD='bestTeam';
ALTER TABLE
ALTER TABLE [keyspace_name.]table_name [ ADD ( column_definition | column_definition_list ) ] [ DROP ( column | column_list | COMPACT STORAGE ) ] [ RENAME column_name TO column_name ] [ WITH table_properties [ , ... ] ] ;Learn more.
Examples
Specifying the table and keyspace
You can qualify the table name by prepending the name of its keyspace. For example, to
specify the teams
table in the cycling
keyspace:
ALTER TABLE cycling.teams ADD manager uuid;
Adding a column
To add a column to a table, use the ADD instruction; for example:
ALTER TABLE cycling.cyclist_races ADD firstname text;
To add a column of a collection type:
ALTER TABLE cycling.upcoming_calendar ADD events list<text>;
This operation does not validate the existing data.
You cannot use the ADD instruction to add:
- A column with the same name as an existing column.
- A static column if the table has no clustering columns and uses
COMPACT STORAGE
.
Dropping a column
ALTER TABLE cycling.basic_info DROP birth_year;
DROP removes the column from the table definition. The column becomes unavailable for queries immediately after it is dropped. The database drops the column data during the next compaction.
- If you drop a column then re-add it, DataStax Distribution of Apache Cassandra™ does not restore the values written before the column was dropped.
- Do not re-add a dropped column that contained timestamps generated by a client; you can re-add columns with timestamps generated by the write time facility.
- You cannot drop columns from tables defined with the COMPACT STORAGE option.
Renaming a column
- You can only rename clustering columns, which are part of the primary key.
- You cannot rename the partition key because the partition key determines the data
storage location on a node. If a different partition name is required, the table must
be recreated and the data migrated.Note: There are many restrictions when using RENAME because SSTables are immutable. To change the state of things on disk, everything must be rewritten.
- You can index a renamed column.
- You cannot rename a column if an index has been created on it.
- You cannot rename a static column.
Modifying table properties
- Single property name and value.
- Property map to set the names and values, as shown in the next section on compression and compaction.
This example uses the WITH instruction to modify the read_repair_chance property, which configures read repair for tables that use for a non-quorum consistency and how to change multiple properties using AND:
ALTER TABLE cyclist_mv WITH comment = 'ID, name, birthdate and country' AND read_repair_chance = 0.2;
Enclose a text property value in single quotation marks. You cannot modify properties of a table that uses COMPACT STORAGE.
Modifying compression and compaction
Use a property map to alter a table's compression or compaction setting:
ALTER TABLE cycling_comments WITH compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 };
Enclose the name of each key in single quotes. If the value is a string, enclose this in quotes as well.
Changing caching
Set the number of rows per partition to store in the row cache for a table to 10 rows:
ALTER TABLE cycling.events WITH caching = { 'keys': 'NONE', 'rows_per_partition': 10 };
Reviewing the table definition
DESC TABLE cycling.events;
The table details including the column names are returned.
CREATE TABLE cycling.events ( month int, end timestamp, class text, title text, location text, start timestamp, type text, PRIMARY KEY (month, end, class, title) ) WITH CLUSTERING ORDER BY (end ASC, class ASC, title ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'NONE', 'rows_per_partition': '10'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE';
ALTER TYPE
ALTER TYPE field_name ( ADD field_name cql_datatype | RENAME field_name TO new_field_name [ AND field_name TO new_field_name ... ] ) ;Learn more.
Examples
Adding a field
To add a new field to a user-defined type, use ALTER TYPE and the ADD keyword. For existing UDTs, the field value is null.
ALTER TYPE fullname ADD middlename text ;
Changing a field name
To change the name of a field in a user-defined type, use the RENAME old_name TO new_name syntax. Rename multiple fields by separating the directives with AND.
ALTER TYPE cycling.fullname RENAME middlename TO middle AND lastname to last AND firstname to first;
DESC TYPE cycling.fullname
CREATE TYPE cycling.fullname ( first text, last text, middle text );
ALTER USER
ALTER USER user_name
WITH PASSWORD 'password'
[ ( SUPERUSER | NOSUPERUSER ) ]
Learn more.
Examples
ALTER USER moss WITH PASSWORD 'bestReceiver';
ALTER USER moss SUPERUSER;
BATCH
BEGIN [ ( UNLOGGED | COUNTER ) ] BATCH [ USING TIMESTAMP [ epoch_microseconds ] ] dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] ; [ dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] [ ; ... ] ] ; APPLY BATCH ;Learn more.
Examples
Applying a client supplied timestamp to all DMLs
Insert meals paid for Vera Adrian using the user-defined date when inserting the records:
BEGIN BATCH USING TIMESTAMP 1481124356754405 INSERT INTO cycling.cyclist_expenses ( cyclist_name, expense_id, amount, description, paid ) VALUES ( 'Vera ADRIAN', 2, 13.44, 'Lunch', true ); INSERT INTO cycling.cyclist_expenses ( cyclist_name, expense_id, amount, description, paid ) VALUES ( 'Vera ADRIAN', 3, 25.00, 'Dinner', true ); APPLY BATCH;
EXPAND ON
SELECT cyclist_name, expense_id, amount, WRITETIME(amount), description, WRITETIME(description), paid, WRITETIME(paid) FROM cycling.cyclist_expenses WHERE cyclist_name = 'Vera ADRIAN';
@ Row 1
------------------------+------------------
cyclist_name | Vera ADRIAN
expense_id | 2
amount | 13.44
writetime(amount) | 1481124356754405
description | Lunch
writetime(description) | 1481124356754405
paid | True
writetime(paid) | 1481124356754405
@ Row 2
------------------------+------------------
cyclist_name | Vera ADRIAN
expense_id | 3
amount | 25
writetime(amount) | 1481124356754405
description | Dinner
writetime(description) | 1481124356754405
paid | False
writetime(paid) | 1481124356754405
(2 rows)
If any DML statement in the batch uses compare-and-set (CAS) logic, for example the
following batch with IF NOT EXISTS
, an error is returned:
BEGIN BATCH USING TIMESTAMP 1481124356754405 INSERT INTO cycling.cyclist_expenses ( cyclist_name, expense_id, amount, description, paid ) VALUES ( 'Vera ADRIAN', 2, 13.44, 'Lunch', true ); INSERT INTO cycling.cyclist_expenses ( cyclist_name, expense_id, amount, description, paid ) VALUES ( 'Vera ADRIAN', 3, 25.00, 'Dinner', false ) IF NOT EXISTS; APPLY BATCH;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot provide custom timestamp for conditional BATCH"
Batching conditional updates
Batch conditional updates introduced as lightweight transactions. However, a batch containing conditional updates can only operate within a single partition, because the underlying Paxos implementation only works at partition-level granularity. If one statement in a batch is a conditional update, the conditional logic must return true, or the entire batch fails. If the batch contains two or more conditional updates, all the conditions must return true, or the entire batch fails. This example shows batching of conditional updates:
IF
conditional
clause.BEGIN BATCH INSERT INTO purchases ( user, balance ) VALUES ( 'user1', -8 ) IF NOT EXISTS; INSERT INTO purchases ( user, expense_id, amount, description, paid ) VALUES ( 'user1', 1, 8, 'burrito', false ); APPLY BATCH;
BEGIN BATCH UPDATE purchases SET balance = -208 WHERE user='user1' IF balance = -8; INSERT INTO purchases ( user, expense_id, amount, description, paid ) VALUES ( 'user1', 2, 200, 'hotel room', false ); APPLY BATCH;Conditional batches cannot provide custom timestamps.
UPDATE
and
DELETE
statements within a conditional batch cannot use
IN
conditions to filter rows.A continuation of this example shows how to use a static column with conditional updates in batch.
Batching counter updates
A batch of counters should use the COUNTER
option because, unlike other
writes in DataStax
Distribution of Apache Cassandra™, a counter update is not an idempotent operation.
BEGIN COUNTER BATCH UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523; UPDATE AdminActionCounts SET total = total + 2 WHERE keyalias = 701; APPLY BATCH;Counter batches cannot include non-counter columns in the DML statements, just as a non-counter batch cannot include counter columns. Counter batch statements cannot provide custom timestamps.
CREATE AGGREGATE
CREATE [ OR REPLACE ] AGGREGATE [ IF NOT EXISTS ] [keyspace_name.]aggregate_name (cql_type) SFUNC udf_name STYPE cql_type FINALFUNC udf_name INITCOND init_value [ DETERMINISTIC ] ;Learn more.
Examples
- Set up a test table with
data:
CREATE TABLE cycling.team_average ( team_name text, cyclist_name text, cyclist_time_sec int, race_title text, PRIMARY KEY (team_name, race_title,cyclist_name) ); INSERT INTO cycling.team_average ( team_name, cyclist_name, cyclist_time_sec, race_title ) VALUES ( 'UnitedHealthCare Pro Cycling Womens Team','Katie HALL',11449, 'Amgen Tour of California Women''s Race presented by SRAM - Stage 1 - Lake Tahoe > Lake Tahoe' ); INSERT INTO cycling.team_average ( team_name, cyclist_name, cyclist_time_sec, race_title ) VALUES ( 'UnitedHealthCare Pro Cycling Womens Team','Linda VILLUMSEN',11485, 'Amgen Tour of California Women''s Race presented by SRAM - Stage 1 - Lake Tahoe > Lake Tahoe' ); INSERT INTO cycling.team_average ( team_name, cyclist_name, cyclist_time_sec, race_title ) VALUES ( 'UnitedHealthCare Pro Cycling Womens Team','Hannah BARNES',11490, 'Amgen Tour of California Women''s Race presented by SRAM - Stage 1 - Lake Tahoe > Lake Tahoe' ); INSERT INTO cycling.team_average ( team_name, cyclist_name, cyclist_time_sec, race_title ) VALUES ( 'Velocio-SRAM','Alena AMIALIUSIK',11451, 'Amgen Tour of California Women''s Race presented by SRAM - Stage 1 - Lake Tahoe > Lake Tahoe' ); INSERT INTO cycling.team_average ( team_name, cyclist_name, cyclist_time_sec, race_title ) VALUES ( 'Velocio-SRAM','Trixi WORRACK',11453, 'Amgen Tour of California Women''s Race presented by SRAM - Stage 1 - Lake Tahoe > Lake Tahoe' ); INSERT INTO cycling.team_average ( team_name, cyclist_name, cyclist_time_sec, race_title ) VALUES ( 'TWENTY16 presented by Sho-Air','Lauren KOMANSKI',11451, 'Amgen Tour of California Women''s Race presented by SRAM - Stage 1 - Lake Tahoe > Lake Tahoe' );
- Create a function with a state parameter as a tuple that counts the rows (by
incrementing 1 for each record) in the first position and finds the total by adding the
current row value to the existing subtotal the second position, and returns the updated
state.
CREATE OR REPLACE FUNCTION cycling.avgState ( state tuple<int,bigint>, val int ) CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java AS $$ if (val != null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state; $$ ;
Note: Use a simple test to verify that your function works properly.CREATE TABLE cycling.test_avg ( id int PRIMARY KEY, state frozen<tuple<int, bigint>>, val int ); INSERT INTO cycling.test_avg ( id, state, val ) VALUES ( 1, (6, 9949), 51 ); INSERT INTO cycling.test_avg ( id, state, val ) VALUES ( 2, (79, 10000), 9999 );
SELECT state, avgState(state, val), val FROM cycling.test_avg;
The first value was incremented by one and the second value is the results of the initial state value and val.
state | cycling.avgstate(state, val) | val -------------+------------------------------+------ (6, 9949) | (7, 10000) | 51 (79, 10000) | (80, 19999) | 9999
- Create a function that divides the total value for the selected column by the number
of
records.
CREATE OR REPLACE FUNCTION cycling.avgFinal ( state tuple<int, bigint> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS $$ double r = 0; if (state.getInt(0) == 0) return null; r = state.getLong(1); r/= state.getInt(0); return Double.valueOf(r); $$ ;
- Create the user-defined aggregate to calculate the average value in the
column:
CREATE OR REPLACE AGGREGATE cycling.average ( int ) SFUNC avgState STYPE tuple<int,bigint> FINALFUNC avgFinal INITCOND (0, 0) ;
- Test the function using a select
statement:
SELECT cycling.average(cyclist_time_sec) FROM cycling.team_average WHERE team_name = 'UnitedHealthCare Pro Cycling Womens Team' AND race_title = 'Amgen Tour of California Women''s Race presented by SRAM - Stage 1 - Lake Tahoe > Lake Tahoe';
CREATE FUNCTION
CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS ] [keyspace_name.]function_name (argument_list [ , ... ]) ( CALLED | RETURNS NULL ) ON NULL INPUT RETURNS cql_data_type [ DETERMINISTIC ] [ MONOTONIC [ ON argument_name ] ] LANGUAGE language_name AS 'code_block' ;Learn more.
Examples
Overwrite or create the fLog function that computes the logarithm of an input value.
CALLED ON NULL INPUT
ensures that the function will always be
executed.
CREATE OR REPLACE FUNCTION cycling.fLog ( input double ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return Double.valueOf(Math.log(input.doubleValue()));' ;
CREATE OR REPLACE FUNCTION cycling.left ( column TEXT, num int ) RETURNS NULL ON NULL INPUT RETURNS text LANGUAGE javascript AS $$ column.substring(0, num) $$ ;
SELECT left(firstname, 1), lastname FROM cycling.cyclist_name;
cycling.left(firstname, 1) | lastname
----------------------------+-----------------
A | FRAME
M | MATTHEWS
P | TIRALONGO
S | KRUIKSWIJK
A | VAN DER BREGGEN
CREATE INDEX
CREATE INDEX [ IF NOT EXISTS ] index_name ON [keyspace_name.]table_name ([ ( KEYS | FULL ) ] column_name) (ENTRIES column_name);Learn more.
Examples
Creating an index on a clustering column
Define a table having a composite partition key, and then create an index on a clustering column.
CREATE TABLE mykeyspace.users ( userID uuid, fname text, lname text, email text, address text, zip int, state text, PRIMARY KEY ((userID, fname), state) ); CREATE INDEX ON mykeyspace.users (state);
Creating an index on a set or list collection
Create an index on a set or list collection column as you would any other column. Enclose the name of the collection column in parentheses at the end of the CREATE INDEX statement. For example, add a collection of phone numbers to the users table to index the data in the phones set.
ALTER TABLE users ADD phones set<text>; CREATE INDEX ON users (phones);
If the collection is a map, you can create an index on map values. Assume the users table contains this map data from the example of a todo map:
{'2014-10-2 12:10' : 'die' }
The map key, the timestamp, is located to the left of the colon, and the map value is located to the right of the colon, 'die'. Indexes can be created on both map keys and map entries .
Creating an index on map keys
You can create an index on map collection keys. If an index of the map values of the collection exists, drop that index before creating an index on the map collection keys.
CREATE INDEX todo_dates ON users (KEYS(todo));
To query the table, you can use CONTAINS KEY in WHERE
clauses.
Creating an index on the map entries
You can create an index on map entries. An ENTRIES
index can be created
only on a map column of a table that doesn't have an existing index.
CREATE INDEX entries_idx ON race (ENTRIES(race_wins));
To query the table, you can use a WHERE
clause.
Creating an index on a full collection
You can create an index on a full FROZEN
collection. An
FULL
index can be created on a set, list, or map column of a table that
doesn't have an existing index.
CREATE INDEX rnumbers_idx ON cycling.race_starts (FULL(rnumbers));
To query the table, you can use a WHERE
clause.
CREATE KEYSPACE
CREATE KEYSPACE [IF NOT EXISTS] keyspace_name
WITH REPLICATION = {replication_map}
[AND DURABLE_WRITES = true|false] ;
Learn more.
Examples
Create a keyspace for a single node evaluation cluster
Create cycling keyspace on a single node evaluation cluster:
CREATE KEYSPACE cycling WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
Create a keyspace NetworkTopologyStrategy on an evaluation cluster
CREATE KEYSPACE cycling WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 } ;
datacenter1
is the default datacenter name. To display the datacenter name, use nodetool
status
.nodetool statusThe node tool returns the data center name, rack name, host name, and IP address.
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 46.59 KB 256 100.0% dd867d15-6536-4922-b574-e22e75e46432 rack1
Create the cycling keyspace in an environment with multiple data centers
Set the replication factor for the Boston, Seattle, and Tokyo datacenters. The data center name must match the name configured in the snitch.
CREATE KEYSPACE "Cycling" WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'boston' : 3 , // Datacenter 1 'seattle' : 2 , // Datacenter 2 'tokyo' : 2 // Datacenter 3 };
Disabling durable writes
Disable write commit log for the cycling keyspace. Disabling the commit log increases the risk of data loss. Do not disable in SimpleStrategy environments.
CREATE KEYSPACE cycling WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 } AND DURABLE_WRITES = false ;
CREATE MATERIALIZED VIEW
CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name AS SELECT [ (column_list) ] FROM [keyspace_name.]table_name WHERE column_name IS NOT NULL [ AND column_name IS NOT NULL ... ] [ AND relation [ AND ... ] ] PRIMARY KEY (column_list) [ WITH [ table_properties ] [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;Learn more.
Examples
This section shows example scenarios that illustrate the use of materialized views.Basic example of a materialized view
Thiscyclist_base
table is used in the first example
scenario:CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text );The following materialized view
cyclist_by_age
uses the base table
cyclist_base
. The WHERE
clause ensures that only rows
whose age
and cid
columns are non-NULL are added to the
materialized view. In the materialized view, age
is the partition key, and
cid
is the clustering column. In the base table, cid
is
the partition
key.CREATE MATERIALIZED VIEW cycling.cyclist_by_age AS SELECT age, cid, birthday, country, name FROM cycling.cyclist_base WHERE age IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (age, cid) WITH CLUSTERING ORDER BY (cid ASC) AND caching = { 'keys' : 'ALL', 'rows_per_partition' : '100' } AND comment = 'Based on table cyclist';The results of this query:
SELECT * FROM cycling.cyclist_by_age;are:
age | cid | birthday | country | name -----+--------------------------------------+------------+---------------+------------------- 28 | 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 1987-06-07 | Netherlands | Steven KRUIKSWIJK 18 | 15a116fc-b833-4da6-ab9a-4a7775752836 | 1997-08-19 | United States | Adrien COSTA 18 | 18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 | 1997-03-29 | Netherlands | Bram WELTEN 18 | ffdfa2a7-5fc6-49a7-bfdc-3fcdcfdd7156 | 1997-02-08 | Netherlands | Pascal EENKHOORN 22 | e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 1993-06-18 | New Zealand | Alex FRAME 27 | c9c9c484-5e4a-4542-8203-8d047a01b8a8 | 1987-09-04 | Brazil | Cristian EGIDIO 27 | d1aad83b-be60-47a4-bd6e-069b8da0d97b | 1987-09-04 | Germany | Johannes HEIDER 20 | 862cc51f-00a1-4d5a-976b-a359cab7300e | 1994-09-04 | Denmark | Joakim BUKDAL 38 | 220844bf-4860-49d6-9a4b-6b5d3a79cbfb | 1977-07-08 | Italy | Paolo TIRALONGO (9 rows)
Using a materialized view to perform queries that are not possible on a base table
The following scenario shows how to use a materialized view to perform queries that are not possible on a base table unless ALLOW FILTERING is used. ALLOW FILTERING is not recommended because of the performance degradation. This table stores the cycling team mechanic information:CREATE TABLE IF NOT EXISTS cycling.mechanic ( emp_id int, dept_id int, name text, age int, birthdate date, PRIMARY KEY (emp_id, dept_id) );The table contains these rows:
emp_id | dept_id | age | birthdate | name --------+---------+-----+------------+------------ 5 | 3 | 25 | 1996-10-04 | Lisa SMITH 1 | 1 | 21 | 1992-06-18 | Fred GREEN 2 | 1 | 22 | 1993-01-15 | John SMITH 4 | 2 | 24 | 1995-08-19 | Jack JONES 3 | 2 | 23 | 1994-02-07 | Jane DOE (5 rows)This materialized view selects the columns from the previous table and contains a different primary key from the table:
CREATE MATERIALIZED VIEW cycling.mechanic_view AS SELECT emp_id, dept_id, name, age, birthdate FROM cycling.mechanic WHERE emp_id IS NOT NULL AND dept_id IS NOT NULL AND name IS NOT NULL AND age IS NOT NULL AND birthdate IS NOT NULL PRIMARY KEY (age, emp_id, dept_id);This query retrieves the rows where the age is 21:
SELECT * FROM cycling.mechanic_view WHERE age = 21;The previous query cannot be run on the base table without ALLOW FILTERING. The output from the previous query is as follows:
age | emp_id | dept_id | birthdate | name -----+--------+---------+------------+------------ 21 | 1 | 1 | 1992-06-18 | Fred GREEN (1 rows)
CREATE ROLE
CREATE ROLE [ IF NOT EXISTS ] role_name [ WITH [ SUPERUSER = ( true | false ) ] [ [ AND ] LOGIN = ( true | false ) ] [ [ AND ] PASSWORD = 'role_password' ] [ [ AND ] OPTIONS = option_map ] ] ;Learn more.
Examples
Creating a login account
- Create a login role for coach.
CREATE ROLE coach WITH PASSWORD = 'All4One2day!' AND LOGIN = true;
Internal authentication requires the role to have a password. - Verify that the account works by logging
in:
LOGIN coach
- Enter the password at the prompt.
Password:
- The cqlsh prompt includes the role
name:
coach@cqlsh>
Creating a role
A best practice when using internal authentication is to create separate roles for permissions and login accounts. Once a role has been created it can be assigned as permission to another role, see GRANT for more details. Roles for externally authenticators users are mapped to the user's group name; LDAP mapping is case sensitive.
Create a role for the cycling keyspace administrator, that is a role that has full permission to only the cycling keyspace.
- Create the role:
CREATE ROLE cycling_admin;
At this point the role has no permissions. Manage permissions using GRANT and REVOKE.Note: A role can only modify permissions of another role and can only modify (GRANT or REVOKE) role permissions that it also has. - Assign the role full access to the cycling
keyspace:
GRANT ALL PERMISSIONS on KEYSPACE cycling to cycling_admin;
- Now assign the role to the
coach.
GRANT cycling_admin TO coach;
This allows you to manage the permissions of all cycling administrators by modifying the cycling_admin role. - View the coach's
permissions.
LIST ALL PERMISSIONS OF coach;
role | username | resource | permission ----------------+----------------+--------------------+------------ cycling_admin | cycling_admin | <keyspace cycling> | CREATE cycling_admin | cycling_admin | <keyspace cycling> | ALTER cycling_admin | cycling_admin | <keyspace cycling> | DROP cycling_admin | cycling_admin | <keyspace cycling> | SELECT cycling_admin | cycling_admin | <keyspace cycling> | MODIFY cycling_admin | cycling_admin | <keyspace cycling> | AUTHORIZE cycling_admin | cycling_admin | <all roles> | AUTHORIZE
Changing a password
ALTER ROLE coach WITH PASSWORD = 'NewPassword'
CREATE TABLE
CREATE TABLE [ IF NOT EXISTS ] [keyspace_name.]table_name ( column_definition [ , ... ] | PRIMARY KEY (column_list) ) [ WITH [ table_options ] [ [ AND ] CLUSTERING ORDER BY [ clustering_column_name order ] ] [ [ AND ] ID = 'table_hash_tag' ] [ [ AND ] COMPACT STORAGE ] ;Learn more.
Create a table that has a frozen user-defined type.
CREATE TABLE cycling.race_winners ( race_name text, race_position int, cyclist_name FROZEN<fullname>, PRIMARY KEY (race_name, race_position) );
See "Creating a user-defined type" for information on creating UDTs. In DataStax Distribution of Apache Cassandra™ (DDAC), UDTs can be created unfrozen if only non-collection fields are used in the user-defined type creation. If the table is created with an unfrozen UDT, then individual field values can be updated and deleted.
CREATE TABLE cycling.cyclist_name ( id UUID PRIMARY KEY, lastname text, firstname text );
Creating a compound primary key
CREATE TABLE cycling.cyclist_category ( category text, points int, id UUID, lastname text, PRIMARY KEY (category, points)) WITH CLUSTERING ORDER BY (points DESC);
Creating a composite partition key
CREATE TABLE cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );
Setting caching
The database caches only the first N rows in a partition, as determined by the clustering order.
ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH caching = { 'keys' : 'ALL', 'rows_per_partition' : 'ALL' };
Creating a table with a CDC log:
CREATE TABLE cycling.cyclist_name WITH cdc = TRUE;
Adding a comment
ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH comment = "Basetable: cyclist_mv";
Change the speculative retries
ALTER TABLE users WITH speculative_retry = '10ms';
ALTER TABLE users WITH speculative_retry = '99percentile';
Enabling and disabling background compaction
ALTER TABLE mytable WITH COMPACTION = { 'class': 'SizeTieredCompactionStrategy', 'enabled': 'false' }
Disabling background compaction can be harmful: without it, the database does not regain disk space, and may allow zombies to propagate. Although compaction uses I/O, it is better to leave it enabled in most cases.
Reading extended compaction logs
Set the
log_all
subproperty to true
to collect in-depth
information about compaction activity on a node in a dedicated log file.
When extended compaction is enabled,
the database creates a file named compaction-%d.log (where
%d
is a sequential number) in home/logs.
-
type:enable
Lists SSTables that have been flushed previously
{"type":"enable","keyspace":"test","table":"t","time":1470071098866,"strategies": [ {"strategyId":"0","type":"LeveledCompactionStrategy","tables":[],"repaired":true,"folders": ["/home/carl/oss/cassandra/bin/../data/data"]}, {"strategyId":"1","type":"LeveledCompactionStrategy","tables":[],"repaired":false,"folders": ["/home/carl/oss/cassandra/bin/../data/data"] } ] }
type: flush
Logs a flush event from a memtable to an SSTable on disk, including the CompactionStrategy for each table.
{"type":"flush","keyspace":"test","table":"t","time":1470083335639,"tables": [ {"strategyId":"1","table": {"generation":1,"version":"mb","size":106846362,"details": {"level":0,"min_token":"-9221834874718566760","max_token":"9221396997139245178"} } } ] }
type: compaction
Logs a compaction event.
{"type":"compaction","keyspace":"test","table":"t","time":1470083660267,"start":"1470083660188","end":"1470083660267","input": [ {"strategyId":"1","table": {"generation":1372,"version":"mb","size":1064979,"details": {"level":1,"min_token":"7199305267944662291","max_token":"7323434447996777057"} } } ],"output": [ {"strategyId":"1","table": {"generation":1404,"version":"mb","size":1064306,"details": {"level":2,"min_token":"7199305267944662291","max_token":"7323434447996777057"} } } ] }
type: pending
Lists the number of pending tasks for a compaction strategy
{"type":"pending","keyspace":"test","table":"t","time":1470083447967,"strategyId":"1","pending":200}
Storing data in descending order
CREATE TABLE cycling.cyclist_category ( category text, points int, id UUID, lastname text, PRIMARY KEY (category, points)) WITH CLUSTERING ORDER BY (points DESC);
Using compaction storage
CREATE TABLE cycling.cyclist_category (
category text,
points int,
id UUID,
lastname text,
PRIMARY KEY (category, points))
WITH CLUSTERING ORDER BY (points DESC)
AND COMPACT STORAGE;
Restoring from commit log replayer
CREATE TABLE cycling.cyclist_emails (
userid text PRIMARY KEY,
emails set<text>
) WITH ID='5a1c395e-b41f-11e5-9f22-ba0be0483c18';
id
column of
system_schema.tables
. For
example:SELECT id
FROM system_schema.tables
WHERE keyspace_name = 'cycling'
AND table_name = 'cyclist_emails';
To perform a point-in-time restoration of the table, see Restoring a backup to a specific point-in-time.
CREATE TRIGGER
CREATE TRIGGER trigger_name ON [keyspace_name.]table_name USING 'org.apache.cassandra.triggers.AuditTrigger' ;Learn more.
CREATE TYPE
CREATE TYPE [ IF NOT EXISTS ] [keyspace_name.]type_name (field_name cql_datatype [ , field_name cql_datatype ... ]) ;Learn more.
Example
This example creates a user-defined type cycling.basic_info
that consists
of personal data about an individual cyclist.
CREATE TYPE cycling.basic_info ( birthday timestamp, nationality text, weight text, height text );
After defining the UDT, you can create a table that has columns with the UDT. CQL collection columns and other columns support the use of user-defined types, as shown in Using CQL examples.
DELETE
DELETE [ column_name [ term ] [ , ... ] ] FROM [keyspace_name.]table_name [ USING TIMESTAMP timestamp_value ] WHERE PK_column_conditions [ ( IF EXISTS | IF static_column_conditions ) ] ;Learn more.
Examples
Delete data from row
Delete the data in specific columns by listing them after the DELETE command, separated by commas. The deleted values are set to null.
DELETE firstname, lastname FROM cycling.cyclist_name WHERE id = e7ae5cf3-d358-4d99-b900-85902fda9bb0;
Delete an entire row
Entering no column names after DELETE, removes the entire matching row. Remove a cyclist entry from the cyclist_name table and return an error if no rows match.
DELETE FROM cycling.cyclist_name WHERE id=e7ae5cf3-d358-4d99-b900-85902fda9bb0 IF EXISTS;
IF
limits the where
clause, allowing selection based on values in non-PRIMARY KEY columns, such as first and
last name; remove the cyclist record if the first and last name do not
match.
DELETE FROM cycling.cyclist_name WHERE id =e7ae5cf3-d358-4d99-b900-85902fda9bb0 if firstname='Alex' and lastname='Smith';
The results show all the data
[applied] | firstname | lastname
-----------+-----------+----------
False | Alex | FRAME
Conditionally deleting columns
You can conditionally delete columns using IF or IF EXISTS. Deleting a column is similar to making an insert or update conditionally.
Add IF EXISTS to the command to ensure that the operation is not performed if the specified row does not exist:
DELETE id FROM cyclist_id WHERE lastname = 'WELTEN' and firstname = 'Bram' IF EXISTS;
Without IF EXISTS, the command proceeds with no standard output. If IF EXISTS returns true (if a row with this primary key does exist), standard output displays a table like the following:
If no such row exists, however, the conditions returns FALSE and the command fails. In this case, standard output looks like:
DELETE id FROM cyclist_id WHERE lastname = 'WELTEN' AND firstname = 'Bram' IF age = 2000;If all the conditions return TRUE, standard output is the same as if IF EXISTS returned true (see above). If any of the conditions fails, standard output displays
False
in the [applied]
column and also
displays information about the condition that failed:Conditional deletions incur a non-negligible performance cost and should be used sparingly.
Deleting old data using TIMESTAMP
The TIMESTAMP is an integer representing microseconds. You can identify the column for deletion using TIMESTAMP.
DELETE firstname, lastname FROM cycling.cyclist_name USING TIMESTAMP 1318452291034 WHERE lastname = 'VOS';
Deleting more than one row
The WHERE clause specifies which row or rows to delete from the table.
DELETE FROM cycling.cyclist_name WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
To delete more than one row, use the keyword IN and supply a list of values in parentheses, separated by commas:
DELETE FROM cycling.cyclist_name WHERE firstname IN ('Alex', 'Marianne');
CQL supports an empty list of values in the IN clause, useful in Java Driver applications.
Deleting from a collection set, list or map
To delete an element from a map that is stored as one column in a row, specify the column_name followed by the key of the element in square brackets:
DELETE sponsorship ['sponsor_name'] FROM cycling.races WHERE race_name = 'Criterium du Dauphine';
To delete an element from a list, specify the column_name followed by the list index position in square brackets:
DELETE categories[3] FROM cycling.cyclist_history WHERE lastname = 'TIRALONGO';
To delete all elements from a set, specify the column_name by itself:
DELETE sponsorship FROM cycling.races WHERE race_name = 'Criterium du Dauphine';
DROP AGGREGATE
DROP AGGREGATE [ IF EXISTS ] [keyspace_name.]aggregate_name [ (argument_name [ , ... ]) ] ;Learn more.
Examples
Drop the avgState aggregate from the cycling keyspace.
DROP AGGREGATE IF EXISTS cycling.avgState;
DROP FUNCTION
DROP FUNCTION [ IF EXISTS ] [keyspace_name.]function_name [ (argument_name [ , ... ]) ] ;Learn more.
Examples
Drops the UDF from the cycling keyspace.
DROP FUNCTION IF EXISTS cycling.fLog;
DROP INDEX
DROP INDEX [ IF EXISTS ] [keyspace.]index_name ;Learn more.
Example
Drop the index ryear
from the
cycling.rank_by_year_and_name
table.
DROP INDEX cycling.ryear;
DROP KEYSPACE
DROP KEYSPACE [ IF EXISTS ] keyspace_name ;Learn more.
Example
DROP KEYSPACE cycling;
DROP MATERIALIZED VIEW
DROP MATERIALIZED VIEW [ IF EXISTS ] [keyspace_name.]view_name ;Learn more.
Example
DROP MATERIALIZED VIEW cycling.cyclist_by_age;
DROP ROLE
DROP ROLE [ IF EXISTS ] role_name ;Learn more.
Examples
Drop the team manager role.
DROP ROLE IF EXISTS team_manager;
DROP TABLE
DROP TABLE [ IF EXISTS ] [keyspace_name.]table_name ;Learn more.
Example
DROP TABLE cycling.cyclist_mv ;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot drop table when materialized views still depend on it (cycling.{cyclist_by_age,cyclist_by_country})"
DROP TABLE cycling.cyclist_name;
DROP TRIGGER
DROP TRIGGER [ IF EXISTS ] trigger_name ON [keyspace_name.]table_name ;Learn more.
DROP TYPE
DROP TYPE [ IF EXISTS ] [keyspace_name]type_name ;Learn more.
Examples
DROP TYPE cycling.basic_info ;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot drop user type cycling.basic_info as it is still used by table cycling.cyclist_stats"
DROP TABLE cycling.cyclist_stats ;
DROP TYPE cycling.basic_info ;
GRANT
GRANT permission [ ON object ] TO role_name ;Learn more.
Examples
In most environments, user authentication is handled by a plug-in that verifies users credentials against an external directory service such as LDAP. The CQL role is mapped to the external group by matching the role name to a group name. For simplicity, the following examples use internal users.
SELECT
statements on all tables in all
keyspaces:GRANT SELECT ON ALL KEYSPACES TO coach;
INSERT
,
UPDATE
, DELETE
and TRUNCATE
statements
on all tables in the field
keyspace:GRANT MODIFY ON KEYSPACE field TO manager;
ALTER
KEYSPACE
statements on the cycling keyspace, and also ALTER
TABLE
, CREATE INDEX
, and DROP INDEX
statements
on all tables in the cycling
keyspace:GRANT ALTER ON KEYSPACE cycling TO coach;
GRANT ALL PERMISSIONS ON cycling.name TO coach;
GRANT ALL ON KEYSPACE cycling TO cycling_admin;
SELECT
statements on rows that contain 'SPONSORED' in the cycling.rank
table:GRANT SELECT ON 'SPONSORED' ROWS IN cycling.rank TO sponsor;
LIST ALL PERMISSIONS
To grant create permissions on a work pool in a specific datacenter:
GRANT CREATE ON WORKPOOL datacenter_name TO role_name;
INSERT
INSERT INTO [keyspace_name.]table_name
[ column_list VALUES column_values ]
[ IF NOT EXISTS ]
[ USING [ TTL seconds ] [ [ AND ] TIMESTAMP epoch_in_microseconds ] ] ;
Learn more.
Examples
Specifying TTL and TIMESTAMP
Insert a cyclist name using both a TTL and timestamp.
INSERT INTO cycling.cyclist_name ( id, lastname, firstname ) VALUES ( 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, 'KRUIKSWIJK', 'Steven' ) USING TTL 86400 AND TIMESTAMP 123456789;
- Time-to-live (TTL) in seconds
- Timestamp in microseconds since epoch
Inserting values into a collection (set and map)
To insert data into a collection, enclose values in curly brackets. Set values must be unique.
INSERT INTO cycling.cyclist_categories ( id, lastname, categories ) VALUES ( '6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47', 'KRUIJSWIJK', { 'GC', 'Time-trial', 'Sprint' } );
teams
that lists two recent team memberships for the
user
VOS.INSERT INTO cycling.cyclist_teams ( id, lastname, teams ) VALUES ( 5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'VOS', { 2015 : 'Rabobank-Liv Woman Cycling Team', 2014 : 'Rabobank-Liv Woman Cycling Team' } );
The size of one item in a collection is limited to 64K.
To insert data into a collection column of a user-defined type, enclose components of the type in parentheses within the curly brackets, as shown in "Using a user-defined type."
Inserting a row only if it does not already exist
Add IF NOT EXISTS to the command to ensure that the operation is not performed if a row with the same primary key already exists:
INSERT INTO cycling.cyclist_name ( id, lastname, firstname ) VALUES ( c4b65263-fe58-4846-83e8-f0e1c13d518f, 'RATTO', 'Rissella' ) IF NOT EXISTS;
true
in the
[applied]
column of the results. For example:
[applied]
-----------
True
false
in the [applied]
column and returns the values
for the existing row. For
example: [applied] | id | firstname | lastname
-----------+--------------------------------------+-----------+----------
False | c4b65263-fe58-4846-83e8-f0e1c13d518f | Rissella | RATTO
LIST PERMISSIONS
LIST permission [ ON resource_name ] [ OF role_name ] [ NORECURSIVE ] ;Learn more.
Example
List all permissions given to coach:
LIST ALL OF coach;
Output is:
rolename | resource | permission
----------+--------------------+------------
coach | <keyspace field> | MODIFY
List permissions given to all the roles:
LIST ALL;
Output is:
rolename | resource | permission
----------+----------------------+------------
coach | <keyspace field> | MODIFY
manager | <keyspace cyclist> | ALTER
manager | <table cyclist.name> | CREATE
manager | <table cyclist.name> | ALTER
manager | <table cyclist.name> | DROP
manager | <table cyclist.name> | SELECT
manager | <table cyclist.name> | MODIFY
manager | <table cyclist.name> | AUTHORIZE
coach | <all keyspaces> | SELECT
List all permissions on the cyclist.name table:
LIST ALL ON cyclist.name;
Output is:
username | resource | permission
----------+----------------------+------------
manager | <table cyclist.name> | CREATE
manager | <table cyclist.name> | ALTER
manager | <table cyclist.name> | DROP
manager | <table cyclist.name> | SELECT
manager | <table cyclist.name> | MODIFY
manager | <table cyclist.name> | AUTHORIZE
coach | <all keyspaces> | SELECT
List all permissions on the cyclist.name table and its parents:
LIST ALL ON cyclist.name NORECURSIVE;
Output is:
username | resource | permission
----------+----------------------+------------
manager | <table cyclist.name> | CREATE
manager | <table cyclist.name> | ALTER
manager | <table cyclist.name> | DROP
manager | <table cyclist.name> | SELECT
manager | <table cyclist.name> | MODIFY
manager | <table cyclist.name> | AUTHORIZE
LIST ROLES
LIST ROLES [ OF role_name ] [ NORECURSIVE ] ;Learn more.
Examples
LIST ROLES;Returns all roles:
role | super | login | options
----------------------------+-------+-------+---------
alice | False | True | {}
cassandra | True | True | {}
cycling_admin | False | False | {}
cycling_analyst | False | False | {}
cycling_app | False | True | {}
cycling_expense_management | False | False | {}
dantest1 | False | True | {}
hockey_analyst | False | False | {}
jane | False | True | {}
jason | False | True | {}
john | False | True | {}
keyspace_admin | False | False | {}
martin | False | True | {}
read_race | False | False | {}
role_admin | False | False | {}
root | True | True | {}
sandy | False | True | {}
sys_admin | False | True | {}
update_accounts | False | False | {}
update_races | False | False | {}
update_teams | False | False | {}
(21 rows)
LIST ROLES OF martin;
role | super | login | options
----------------+-------+-------+---------
keyspace_admin | False | False | {}
martin | False | True | {}
(2 rows)
RESTRICT ROWS
RESTRICT ROWS ON [keyspace_name.]table_name USING pk_column_name ;Learn more.
Examples
For the cyclist_expenses
table, configure the
cyclist_name
column for filtering so that permissions can be assigned. In
this example, we identify the column so that each cyclist can view only their own expenses:
RESTRICT ROWS ON cyclist_expenses USING cyclist_name;
GRANT SELECT ON 'Vera ADRIAN' ROWS IN cyclist_expenses TO cycling_accounts;
REVOKE
REVOKE privilege ON resource_name FROM role_name ;Learn more.
Example
SELECT
queries
on the cycling.name table.
REVOKE SELECT ON cycling.name FROM manager;Exceptions: Because of inheritance, the user can perform
SELECT
queries on
cycling.name if one of these conditions is met:- The user is a superuser.
- The user has
SELECT
onALL KEYSPACES
permissions. - The user has
SELECT
on the cycling keyspace.
GRANT
,
ALTER
or REVOKE
commands on all
roles:REVOKE ALTER ON ALL ROLES FROM coach;
SELECT
SELECT selectors FROM [keyspace_name.]table_name [ WHERE [ primary_key_conditions ] [ AND ] [ index_conditions ] [ GROUP BY column_name [ , ... ] ] [ ORDER BY PK_column_name [ , ... ] ( ASC | DESC ) ] [ ( LIMIT N | PER PARTITION LIMIT N ) ] [ ALLOW FILTERING ] ;Learn more.
Examples
Using a column alias
When your selection list includes functions or other complex expressions, use aliases to
make the output more readable. This example applies aliases to the
dateOf(created_at)
and blobAsText(content)
functions:
SELECT event_id, dateOf(created_at) AS creation_date, blobAsText(content) AS content FROM timeline;
The output labels these columns with more understandable names:
event_id | creation_date | content
-------------------------+--------------------------+----------------
550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Some stuff
(1 rows)
The number of rows returned by the query is shown at the bottom of the output.
Specifying the source table using FROM
The following example SELECT statement returns the number of rows in
the IndexInfo
table in the system
keyspace:
SELECT COUNT(*) FROM system.IndexInfo;
Controlling the number of rows returned using LIMIT
The LIMIT
option sets the maximum number of rows that the query
returns:
SELECT lastname FROM cycling.cyclist_name LIMIT 50000;
Even if the query matches 105,291 rows, the database only returns the first 50,000.
The cqlsh
shell has a default row limit of 10,000. The Cassandra server and native protocol
do not limit the number of returned rows, but they apply a timeout to prevent malformed
queries from causing system instability.
Selecting partitions
partition_column = value
partition_column IN ( value1, value2 [ , ... ] )
partition_column1 = value1
AND partition_column2 = value2 [ AND ... ] )
Controlling the number of rows returned using PER PARTITION LIMIT
In Cassandra, the
PER PARTITION LIMIT
option sets the maximum number of rows that the
query returns from each partition. For example, create a table that will sort data into
more than one partition.
CREATE TABLE cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );
After inserting data, the table contains these rows:
race_year | race_name | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT
2014 | 4th Tour of Beijing | 2 | Daniel MARTIN
2014 | 4th Tour of Beijing | 3 | Johan Esteban CHAVES
2015 | Giro d'Italia - Stage 11 - Forli > Imola | 1 | Ilnur ZAKARIN
2015 | Giro d'Italia - Stage 11 - Forli > Imola | 2 | Carlos BETANCUR
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Adam PHELAN
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Thomas LEBAS
To get the top two racers in every race year and race name, use the
SELECT statement with PER PARTITION LIMIT 2
.
SELECT * FROM cycling.rank_by_year_and_name PER PARTITION LIMIT 2;
Output:
race_year | race_name | rank | cyclist_name
-----------+--------------------------------------------+------+-------------------
2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT
2014 | 4th Tour of Beijing | 2 | Daniel MARTIN
2015 | Giro d'Italia - Stage 11 - Forli > Imola | 1 | Ilnur ZAKARIN
2015 | Giro d'Italia - Stage 11 - Forli > Imola | 2 | Carlos BETANCUR
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Adam PHELAN
Filtering data using WHERE
The WHERE
clause introduces one or more relations that filter the rows
returned by SELECT.
The column specifications
- One or more members of the partition key of the table
- A clustering column, only if the relation is preceded by other relations that specify all columns in the partition key
- A column that is indexed using CREATE INDEX.
WHERE
clause, refer to a column using the
actual name, not an alias. Filtering on the partition key
id
as the table's
partition
key:CREATE TABLE cycling.cyclist_career_teams ( id UUID PRIMARY KEY, lastname text, teams set<text> );In this example, the SELECT statement includes in the partition key, so the WHERE clause can use the
id
column:SELECT id, lastname, teams FROM cycling.cyclist_career_teams WHERE id=5b6962dd-3f90-4c93-8f61-eabfa4a803e2;
Filtering on a clustering column
Use a relation on a clustering column only if it is preceded by relations that reference all the elements of the partition key.
Example:
CREATE TABLE cycling.cyclist_points ( id UUID, firstname text, lastname text, race_title text, race_points int, PRIMARY KEY (id, race_points ) );
SELECT sum(race_points) FROM cycling.cyclist_points WHERE id=e3b19ec4-774a-4d1c-9e5a-decec1e30aac AND race_points > 7;
Output:
system.sum(race_points)
-------------------------
195
(1 rows)
In DataStax Enteprise 5.1 and later, it is possible to add ALLOW
FILTERING
to filter on a non-indexed cluster column.
ALLOW FILTERING
because it impacts
performance.The table definition is included in this example to show that
race_start_date
is a clustering column without a secondary index:
CREATE TABLE cycling.calendar ( race_id int, race_name text, race_start_date timestamp, race_end_date timestamp, PRIMARY KEY (race_id, race_start_date, race_end_date) );
SELECT * FROM cycling.calendar WHERE race_start_date='2015-06-13' ALLOW FILTERING;
Output:
race_id | race_start_date | race_end_date | race_name
---------+---------------------------------+---------------------------------+----------------
102 | 2015-06-13 07:00:00.000000+0000 | 2015-06-13 07:00:00.000000+0000 | Tour de Suisse
103 | 2015-06-13 07:00:00.000000+0000 | 2015-06-17 07:00:00.000000+0000 | Tour de France
It is possible to combine the partition key and a clustering column in a single relation. For details, see Comparing clustering columns.
Filtering on indexed columns
A WHERE clause in a SELECT on an indexed table must include at least one equality relation to the indexed column. For details, see Indexing a column.
Using the IN operator
Use IN
, an equals condition operator, to list multiple possible values
for a column. Thjs example selects two columns, first_name
and
last_name
, from three rows having employee ids (primary key) 105, 107,
or 104:
SELECT first_name, last_name FROM emp WHERE empID IN (105, 107, 104);
The list can consist of a range of column values separated by commas.
Using IN to filter on a compound or composite primary key
IN
condition on the last column of the partition key only when it
is preceded by equality conditions for all preceding columns of the partition key. For
example:CREATE TABLE parts ( part_type text, part_name text, part_num int, part_year text, serial_num text, PRIMARY KEY ((part_type, part_name), part_num, part_year) );
SELECT * FROM parts WHERE part_type = 'alloy' AND part_name = 'hubcap' AND part_num = 1249 AND part_year IN ('2010', '2015');
IN
, you can omit the equality test for clustering columns
other than the last. But this usage may require the use of ALLOW
FILTERING
, so it impacts performance. For
example:SELECT * FROM parts WHERE part_num = 123456 AND part_year IN ('2010', '2015') ALLOW FILTERING;
CQL supports an empty list of values in the IN
clause, useful in Java
Driver applications when passing empty arrays as arguments for the IN
clause.
When not to use IN
Under most conditions, using IN
in relations on the partition key is not
recommended. To process a list of values, the SELECT may have to query
many nodes, which degrades performance. For example, consider a single local datacenter
cluster with 30 nodes, a replication factor of 3, and a consistency level of
LOCAL_QUORUM
. A query on a single partition key query goes out to two
nodes. But if the SELECT uses the IN
condition, the
operation can involve more nodes — up to 20, depending on where the keys fall in the token
range.
Using IN
for clustering columns is safer. See Cassandra Query Patterns: Not using the “in” query for
multiple partitions for additional logic about using IN
.
Filtering on collections
Your query can retrieve a collection in its entirety. It can also index the collection
column, and then use the CONTAINS
condition in the WHERE
clause to filter the data for a particular value in the collection, or use
CONTAINS KEY
to filter by key. This example features a collection of
tags in the playlists table. The query can index the tags, then
filter on 'blues' in the tag set.
SELECT album, tags FROM playlists WHERE tags CONTAINS 'blues';
SELECT * FROM playlists WHERE venue CONTAINS 'The Fillmore';
SELECT * FROM playlists WHERE venue CONTAINS KEY '2014-09-22 22:00:00-0700';
Filtering a map's entries
CREATE INDEX blist_idx ON cycling.birthday_list (ENTRIES(blist));
SELECT * FROM cycling.birthday_list WHERE blist['age'] = '23';
Filtering a full frozen collection
FROZEN
collection
(set, list, or map). The query retrieves rows that fully match the collection's values.
CREATE INDEX rnumbers_idx ON cycling.race_starts (FULL(rnumbers));
SELECT * FROM cycling.race_starts WHERE rnumbers = [39, 7, 14];
Range relations
Cassandra supports greater-than and less-than comparisons, but for a given partition key, the conditions on the clustering column are restricted to the filters that allow selection of a contiguous set of rows.
CREATE TABLE ruling_stewards ( steward_name text, king text, reign_start int, event text, PRIMARY KEY (steward_name, king, reign_start) );
king
were not a component of the primary
key, you would need to create an index on king to use this
query:SELECT * FROM ruling_stewards WHERE king = 'Brego' AND reign_start >= 2450 AND reign_start < 2500 ALLOW FILTERING;
steward_name | king | reign_start | event
--------------+-------+-------------+--------------------
Boromir | Brego | 2477 | Attacks continue
Cirion | Brego | 2489 | Defeat of Balchoth
(2 rows)
To allow selection of a contiguous set of rows, the
WHERE clause must apply an equality condition to the
king
component of the primary key. The ALLOW FILTERING clause is also required.
ALLOW FILTERING
provides the capability to query the clustering columns
using any condition.
ALLOW FILTERING
is intended for development environments only and is
not recommended for production. When you attempt a potentially expensive query, such as
searching a range of rows, Cassandra displays this message:
Bad Request: Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance. If you want
to execute this query despite the performance unpredictability,
use ALLOW FILTERING.
To run this type of query, use ALLOW FILTERING
, and restrict the
output to n
rows using LIMIT n
. For example:
SELECT * FROM ruling_stewards WHERE king = 'none' AND reign_start >= 1500 AND reign_start < 3000 LIMIT 10 ALLOW FILTERING;
Using LIMIT
does not prevent all problems caused by ALLOW
FILTERING
. In this example, if there are no entries without a value for
king, the SELECT scans the entire table, no
matter what the LIMIT
is.
It is not necessary to use LIMIT
with ALLOW
FILTERING
, and LIMIT
can be used by itself. But
LIMIT
can prevent a query from ranging over all partitions in a
datacenter, or across multiple datacenters..
Using compound primary keys and sorting results
ORDER BY
clause with a compound primary key: - Only include clustering columns in the
ORDER BY
clause. - In the
WHERE
clause, provide all the partition key values and clustering column values that precede the column(s) in theORDER BY
clause. In DataStax Distribution of Apache Cassandra™ (DDAC), the columns specified in theORDER BY
clause must be an ordered subset of the columns of the clustering key, starting from the first column of the clustering key. - When sorting multiple columns, the columns must be listed in the same order in the
ORDER BY
clause as they are listed in thePRIMARY KEY
clause of the table definition. - Sort ordering is limited. For example, if your table definition uses
CLUSTERING ORDER BY (start_month ASC, start_day ASC)
, then you can useORDER BY start_day, race
in your query (ASC
is the default). You can also reverse the sort ordering if you apply it to all of the columns; for example,ORDER BY start_day DESC, race DESC
. - Refer to a column using the actual name, not an alias.
For example, set up the playlists table
(which uses a compound primary key), and use this query to get information about a
particular playlist, ordered by song_order. You do not need to include the ORDER
BY
column in the select expression.
SELECT * FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204 ORDER BY song_order DESC LIMIT 50;
Output:
Or, create an index on playlist artists, and use this query to get titles of Fu Manchu songs on the playlist:
CREATE INDEX ON playlists(artist);
SELECT album, title FROM playlists WHERE artist = 'Fu Manchu';
Output:
Grouping results
A GROUP BY
clause condenses the selected rows that share the same values for a set of columns into a group.
For example, this query groups the rows by the race date in the
race_times
table:
SELECT race_date, race_name FROM cycling.race_times GROUP BY race_date;
race_date | race_name
------------+-----------------------------
2017-04-14 | 17th Santos Tour Down Under
(1 rows)
Computing aggregates
Cassandra provides standard built-in functions that return aggregate values to SELECT statements.
Using COUNT() to get the non-null value count for a column
A SELECT
expression using
COUNT(column_name)
returns the number of non-null values in a column.
COUNT
ignores null values.
For example, count the number of last
names in the cyclist_name
table:
SELECT COUNT(lastname) FROM cycling.cyclist_name;
Getting the number of matching rows and aggregate values with COUNT()
A SELECT
expression using
COUNT(*)
returns the number of rows that matched the query. Use
COUNT(1)
to get the same result. COUNT(*)
or
COUNT(1)
can be used in conjunction with other aggregate functions or
columns.
This example counts the number of rows in the cyclist name table:
SELECT COUNT(*) FROM cycling.cyclist_name;
This example calculates the maximum value for start day in the cycling events table and counts the number of rows returned:
SELECT start_month, MAX(start_day), COUNT(*) FROM cycling.events WHERE year = 2017 AND discipline = 'Cyclo-cross';
This example provides a year that is not stored in the events table:
SELECT start_month, MAX(start_day) FROM cycling.events WHERE year = 2022 ALLOW FILTERING;
start_month | system.max(start_day)
-------------+-----------------------
null | null
(1 rows)
Getting
maximum and minimum values in a columnSELECT
expression using
MAX(column_name)
returns the maximum value in a
column. When the column's data type is numeric (bigint
,
decimal
, double
, float
,
int
, smallint
), this is the highest
value.SELECT MAX(points) FROM cycling.cyclist_category;
Output:
MIN
returns the minimum value. If the query includes a
WHERE
clause, MAX
or MIN
returns the
largest or smallest value from the rows that satisfy the WHERE
condition.SELECT category, MIN(points) FROM cycling.cyclist_category WHERE category = 'GC';
Output:
MAX
or MIN
has an
ascii
or text
data type, these functions return the
last or first item in an alphabetic sort of the column values. If the specified column has
data type date
or timestamp
, these functions return the
most recent or least recent times and dates. If a column has a null value,
MAX
and MIN
ignores that value; if the column for an
entire set of rows contains null, MAX
and MIN
return
null.Getting the sum or average of a column of numbers
Cassandra
computes the sum or average of all values in a column when SUM
or
AVG
is used in the SELECT statement:
SUM
or AVG
aggregation, Cassandra includes that row in the row count,
but uses a zero value to calculate the average.SUM
and
AVG
functions do not work with text
,
uuid
or date
fields.Retrieving the date/time a write occurred
The WRITETIME function applied to a column returns the date/time in microseconds at which the column was written to the database.
For example, to retrieve the date/time that a write occurred to the first_name column of the user whose last name is Jones:
SELECT WRITETIME (first_name) FROM users WHERE last_name = 'Jones';
writetime(first_name)
-----------------------
1353010594789000
The WRITETIME
output in microseconds converts to November 15, 2012 at
12:16:34 GMT-8.
Retrieving the time-to-live of a column
INSERT INTO cycling.calendar ( race_id, race_name, race_start_date, race_end_date ) VALUES ( 200, 'placeholder', '2015-05-27', '2015-05-27' ) USING TTL 100;
UPDATE cycling.calendar USING TTL 300 SET race_name = 'dummy' WHERE race_id = 200 AND race_start_date = '2015-05-27' AND race_end_date = '2015-05-27';After inserting the TTL, use this SELECT statement to check its current value:
SELECT TTL(race_name) FROM cycling.calendar WHERE race_id = 200;Output:
ttl(race_name)
----------------
276
(1 rows)
Retrieving values in the JSON format
This option is available in DDAC. For details, see Retrieval using JSON.
TRUNCATE
TRUNCATE [ TABLE ] [keyspace_name.]table_name ;Learn more.
Examples
- If necessary, use the cqlsh CONSISTENCY command to set the
consistency level to
ALL
. - Use nodetool status or some other tool to make sure all nodes are up and receiving connections.
- Use TRUNCATE or TRUNCATE TABLE, followed by the
table name. For
example:
TRUNCATE cycling.user_activity;
TRUNCATE TABLE cycling.user_activity;
truncate cycling.user_activity;
Unable to complete request: one or more nodes were unavailable.
UNRESTRICT
UNRESTRICT ROWS ON [keyspace_name.]table_name ;Learn more.
Examples
UNRESTRICT ROWS ON cyclist_name;
UPDATE
UPDATE [keyspace_name.]table_name [ USING TTL time_value ] [ [ AND ] USING TIMESTAMP timestamp_value ] SET assignment [ , assignment , ... ] WHERE row_specification [ ( IF EXISTS | IF NOT EXISTS | IF condition [ AND condition ... ] ) ] ;Learn more.
Examples
Updating a column
Update a column in several rows at once:
UPDATE users SET state = 'TX' WHERE user_uuid IN ( 88b8fd18-b1ed-4e96-bf79-4280797cba80, 06a8913c-c0d6-477c-937d-6c1b69a95d43, bc108776-7cb5-477f-917d-869c12dfffa8 );CQL supports an empty list of values in the IN clause, useful in Java Driver applications when passing empty arrays as arguments for the IN clause.
Update several columns in a single row:
UPDATE cycling.cyclists SET firstname = 'Marianne', lastname = 'VOS' WHERE id = 88b8fd18-b1ed-4e96-bf79-4280797cba80;
Update a row in a table with a complex primary key
To do this, specify all keys in a table having compound and clustering columns. For example, update the value of a column in a table having a compound primary key, userid and url:
UPDATE excelsior.clicks USING TTL 432000 SET user_name = 'bob' WHERE userid = cfd66ccc-d857-4e90-b1e5-df98a3d40cd6 AND url = 'http://google.com'; UPDATE Movies SET col1 = val1, col2 = val2 WHERE movieID = key1; UPDATE Movies SET col3 = val3 WHERE movieID IN (key1, key2, key3); UPDATE Movies SET col4 = 22 WHERE movieID = key4;
Updating a counter column
To update a counter column value in a counter table, specify the increment or decrement to apply to the current value.
UPDATE cycling.popular_count SET popularity = popularity + 2 WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
To use a lightweight transaction on a counter column to ensure accuracy, put one or more counter updates in the batch statement. For details, see Performing conditional updates in a batch.
Creating a partition using UPDATE
cyclists
table, whose primary key is (id)
, you can
UPDATE the partition with id e7cd5752-bc0d-4157-a80f-7523add8dbcd
, even
though it does not exist
yet:UPDATE cycling.cyclists SET firstname = 'Anna', lastname = 'VAN DER BREGGEN' WHERE id = e7cd5752-bc0d-4157-a80f-7523add8dbcd;
Updating a list
To insert values into the list:
UPDATE cycling.upcoming_calendar SET events = ['Criterium du Dauphine','Tour de Suisse'];
To prepend an element to the list, enclose it in square brackets and use the addition (+) operator:
UPDATE cycling.upcoming_calendar SET events = ['Tour de France'] + events WHERE year = 2015 AND month = 06;
To append an element to the list, switch the order of the new element data and the list name:
UPDATE cycling.upcoming_calendar SET events = events + ['Tour de France'] WHERE year = 2017 AND month = 05;
To add an element at a particular position, use the list index position in square brackets:
UPDATE cycling.upcoming_calendar SET events[4] = 'Tour de France' WHERE year = 2016 AND month = 07;
To remove all elements having a particular value, use the subtraction operator (-) and put the list value in square brackets:
UPDATE cycling.upcoming_calendar SET events = events - ['Criterium du Dauphine'] WHERE year = 2016 AND month = 07;
To update data in a collection column of a user-defined type, enclose components of the type in parentheses within the curly brackets, as shown in "Using a user-defined type."
set
or
map
collection types are safer for updates.Updating a set
To add an element to a set, use the UPDATE command and the addition (+) operator:
UPDATE cycling.cyclist_career_teams SET teams = teams + {'Team DSB - Ballast Nedam'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;
To remove an element from a set, use the subtraction (-) operator:
UPDATE cycling.cyclist_career_teams SET teams = teams - {'DSB Bank Nederland bloeit'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;
To remove all elements from a set:
UPDATE cycling.cyclist_career_teams SET teams = {} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;
Updating a map
To set or replace map data, enclose the values in map syntax: strings in curly brackets, separated by a colon.
UPDATE cycling.upcoming_calendar SET description = description + {'Criterium du Dauphine' : 'Easy race'} WHERE year = 2015;
To update or set a specific element, such as adding a new race to the calendar in a map
named events
:
UPDATE cycling.upcoming_calendar SET events[2] = 'Vuelta Ciclista a Venezuela' WHERE year = 2016 AND month = 06;
To set the a TTL for each map element:
UPDATE cycling.upcoming_calendar USING TTL <ttl_value>
SET events[2] = 'Vuelta Ciclista a Venezuela'
WHERE year = 2016 AND month = 06;
You can update the map by adding one or more elements separated by commas:
UPDATE cycling.upcoming_calendar SET description = description + {'Criterium du Dauphine' : 'Easy race', 'Tour du Suisse' : 'Hard uphill race'} WHERE year = 2015 AND month = 6;
Remove elements from a map in the same way using - instead of +.
About updating sets and maps caution
CQL supports alternate methods for updating sets and maps. These alternatives may seem to accomplish the same tasks, but the database handles them differently in important ways.
UPDATE cycling.upcoming_calendar SET description = {'Criterium du Dauphine' : 'Easy race', 'Tour du Suisse' : 'Hard uphill race'} WHERE year = 2015 AND month = 6;
+
operator
as described
above:UPDATE cycling.upcoming_calendar SET description = description + { 'Tour de France' : 'Very competitive'} WHERE year = 2015 AND month = 6;
UPDATE cycling.upcoming_calendar SET description = {'Criterium du Dauphine' : 'Easy race', 'Tour du Suisse' : 'Hard uphill race', 'Tour de France' : 'Very competitive'} WHERE year = 2015 AND month = 6;
These two statements seem to do the same thing. But behind the scenes, the database processes the second statement by deleting the entire collection and replacing it with a new collection containing three entries. This creates tombstones for the deleted entries, even though these entries are identical to the entries in the new map collection. If your code updates all map collections this way, it generates many tombstones, which may slow the system down.
The examples above use map collections, but the same caution applies to updating sets.
Updating a UDT with non-collection fields
In DataStax Distribution of Apache Cassandra™ (DDAC), to change the value of an individual field value in a user-defined type with non-collection fields, use the UPDATE command:
UPDATE cyclist_stats SET basics.birthday = '2000-12-12' WHERE id = 220844bf-4860-49d6-9a4b-6b5d3a79cbfb;
Conditionally updating columns
You can conditionally update columns using IF or IF EXISTS.
Add IF EXISTS to the command to ensure that the operation is not performed if the specified row exists:
UPDATE cycling.cyclist_id SET age = 28 WHERE lastname = 'WELTEN' AND firstname = 'Bram' IF EXISTS;
Without IF EXISTS, the command proceeds with no standard output. If IF EXISTS returns true (if a row with this primary key does exist), standard output displays a table like the following:
If no such row exists, however, the condition returns FALSE and the command fails. In this case, standard output looks like:
UPDATE cyclist_id SET id = 15a116fc-b833-4da6-ab9a-4a3775750239 WHERE lastname = 'WELTEN' AND firstname = 'Bram' IF age = 18;If all the conditions return TRUE, standard output is the same as if IF EXISTS returned true (see above). If any of the conditions fails, standard output displays
False
in the [applied]
column and also
displays information about the condition that failed:Conditional updates are examples of lightweight transactions. They incur a non-negligible performance cost and should be used sparingly.
Performing conditional updates in a BATCH
The UPDATE command creates a new row if no matching row is found. New rows are not immediately available for lightweight transactions applied in the same BATCH.
For example, consider a simple table with four defined columns:
CREATE TABLE mytable (a int, b int, s int static, d text, PRIMARY KEY (a, b)) BEGIN BATCH INSERT INTO mytable (a, b, d) values (7, 7, 'a') UPDATE mytable SET s = 7 WHERE a = 7 IF s = NULL; APPLY BATCH
In the first batch above, the insert command creates a partition with
primary key values (7,7) but does not set a value for the s
column. Even
though the s
column was not defined for this row, the IF s =
NULL
conditional succeeds, so the batch succeeds. (In previous versions, the
conditional would have failed,and that failure would have caused the entire batch to
fail.)
The second batch demonstrates more complex handling of a lightweight transaction:
BEGIN BATCH INSERT INTO mytable (a, b, d) values (7, 7, 'a') UPDATE mytable SET s = 1 WHERE a = 1 IF s = NULL; APPLY BATCH
In this case, the IF
statement tests the value of a column in a
partition that does not even exist before it is created by the UPDATE.
Even so, Cassandra recognizes the
implicit presence of the partition and its column s
, and lets the
conditional test succeed. This allows the batch to succeed.
USE
USE keyspace_name ;Learn more.
Example
USE PortfolioDemo;
USE "Excalibur";