How to create CQL materialized views.
In Cassandra 3.0 and later, a materialized view is a table that is built from another table's data with a new primary key and new properties. In Cassandra, queries are optimized by primary key definition. Standard practice is to create the table for the query, and create a new table if a different query is needed. Until Cassandra 3.0, these additional tables had to be updated manually in the client application. A materialized view automatically receives the updates from its source table.
Materialized views are suited for high cardinality data. The data in a materialized view is arranged serially based on the view's primary key. Materialized views cause hotspots when low cardinality data is inserted.
- The columns of the source table's primary key must be part of the materialized view's primary key.
- Only one new column can be added to the materialized view's primary key. Static columns are not allowed.
You can create a materialized view with its own WHERE conditions and its own properties.
This table holds values for the name, age, birthday, and country affiliation of several cyclists.
CREATE TABLE cyclist_mv (cid UUID PRIMARY KEY, name text, age int, birthday date, country text);
cyclist_mvtable can be the basis of a materialized view that uses age in the primary key.
CREATE MATERIALIZED VIEW cyclist_by_age AS SELECT age, birthday, name, country FROM cyclist_mv WHERE age IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (age, cid);
CREATE MATERIALIZED VIEWstatement has several features:
AS SELECTphrase identifies the columns copied from the base table to the materialized view.
FROMphrase identifies the source table from which Cassandra will copy the data.
WHEREclause must include all primary key columns with the
IS NOT NULLphrase so that only rows with data for all the primary key columns are copied to the materialized view.
- As with any table, the specification of the primary key columns. Since
cyclist_mv, the source table, uses cid as its primary key, cid must be present in the materialized view's primary key.Note: In this materialized view, age is used as the primary key and cid is a clustering column. In Cassandra3.0 and earlier, you cannot use a column as a clustering column if any of its values is larger than 64K bytes.
SELECT age, name, birthday FROM cyclist_by_age WHERE age = 18;
CREATE MATERIALIZED VIEW cyclist_by_birthday AS SELECT age, birthday, name, country FROM cyclist_mv WHERE birthday IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (birthday, cid); CREATE MATERIALIZED VIEW cyclist_by_country AS SELECT age,birthday, name, country FROM cyclist_mv WHERE country IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (country, cid);
SELECT age, name, birthday FROM cyclist_by_country WHERE country = 'Netherlands';
SELECT age, name, birthday FROM cyclist_by_birthday WHERE birthday = '1987-09-04';
This materialized view will only store information for cyclists from the Netherlands because of the addition of:
CREATE MATERIALIZED VIEW cyclist_by_birthday_Netherlands AS SELECT age, birthday, name, country FROM cyclist_mv WHERE birthday IS NOT NULL AND cid IS NOT NULL AND country='Netherlands' PRIMARY KEY (birthday, cid);
WHEREclause. Now a query can be submitted to find those cyclists from the Netherlands with a particular birthday:
SELECT age, name, birthday FROM cyclist_by_birthday WHERE birthday = '1997-02-08';
When another INSERT is executed on
cyclist_mv, Cassandra updates the source
table and both of these materialized views. When data is deleted from
cyclist_mv, Cassandra deletes the same data from any related materialized
Materialized views allow fast lookup of the data using the normal Cassandra read path. However, materialized views do not have the same write performance as normal table writes. Cassandra performs an additional read-before-write to update each materialized view. To complete an update, Cassandra preforms a data consistency check on each replica. A write to the source table incurs latency. The performance of deletes on the source table also suffers. If a delete on the source table affects two or more contiguous rows, this delete is tagged with one tombstone. But these same rows may not be contiguous in materialized views derived from the source table. If they are not, Cassandra creates multiple tombstones in the materialized views.
Cassandra can only write data directly to source tables, not to materialized views. Cassandra updates a materialized view asynchronously after inserting data into the source table, so the update of materialized view is delayed. Cassandra performs a read repair to a materialized view only after updating the source table.