Materialized views (MVs) examples
Materialized views (MVs) are a feature of Apache Cassandra® that allows you to create a new table that is a copy of an existing table with a different primary key.
This enables queries that aren’t possible on the original table without using ALLOW FILTERING.
Prepare to create materialized views
Before you can create a MV, you must create a keyspace and create a table.
To follow along with these examples, create the following keyspace and table, and then load the sample data.
The table has a single primary key column (cid), and it holds values for the name, age, birthday, and country affiliation of several cyclists.
-
Keyspace:
CREATE KEYSPACE IF NOT EXISTS cycling WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; -
Table:
CREATE TABLE IF NOT EXISTS cycling.cyclist_base ( cid UUID PRIMARY KEY, name text, age int, birthday date, country text ); -
Rows:
cid | age | birthday | country | name --------------------------------------+-----+------------+---------------+------------------- ffdfa2a7-5fc6-49a7-bfdc-3fcdcfdd7156 | 18 | 1997-02-08 | Netherlands | Pascal EENKHOORN 15a116fc-b833-4da6-ab9a-4a7775752836 | 18 | 1997-08-19 | United States | Adrien COSTA e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 22 | 1993-06-18 | New Zealand | Alex FRAME c9c9c484-5e4a-4542-8203-8d047a01b8a8 | 27 | 1987-09-04 | Brazil | Cristian EGIDIO 862cc51f-00a1-4d5a-976b-a359cab7300e | 20 | 1994-09-04 | Denmark | Joakim BUKDAL 1c526849-d3a2-42a3-bcf9-7903c80b3d16 | 19 | 1998-12-23 | Australia | Kanden GROVES 18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 | 18 | 1997-03-29 | Netherlands | Bram WELTEN 220844bf-4860-49d6-9a4b-6b5d3a79cbfb | 38 | 1977-07-08 | Italy | Paolo TIRALONGO 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | 28 | 1987-06-07 | Netherlands | Steven KRUIKSWIJK 95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b | 20 | 1993-04-09 | United States | Zam WEATON (10 rows)
Create MV based on a non-primary key INT column
-
Using the example table, create a MV using the
agecolumn (a non-primary key INT column) as the partition key in the primary key for the MV.As with any table, the materialized view must specify the primary key columns.
Because the original table
cyclist_baseusescidas its primary key,cidmust be present in the materialized view’s primary key. In this example,cidis used as a clustering column, not a partition key.The MV is called
cycling.cyclist_by_age.CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_age AS SELECT age, cid, birthday, country, name FROM cycling.cyclist_base WHERE age IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (age, cid) WITH CLUSTERING ORDER BY (cid ASC) AND caching = { 'keys' : 'ALL', 'rows_per_partition' : '100' } AND comment = 'Based on table cyclist_base';Note the following characteristics of this
CREATE MATERIALIZED VIEWstatement:-
The
AS SELECTphrase identifies the columns copied from the original table to the materialized view. -
The
FROMphrase identifies the original table from which data is copied. -
The
WHEREclause must include all primary key columns with theIS NOT NULLphrase so that only rows with data for all the primary key columns are copied to the materialized view.
-
-
To verify that the materialized view was created, use
DESCRIBE MATERIALIZED VIEWto get the MV schema:DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_age;ResultCREATE MATERIALIZED VIEW cycling.cyclist_by_age AS SELECT age, cid, birthday, country, name FROM cycling.cyclist_base WHERE age IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (age, cid) WITH CLUSTERING ORDER BY (cid ASC) AND additional_write_policy = '99PERCENTILE' AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': '100'} AND comment = 'Based on table cyclist_base' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99PERCENTILE'; -
Query using the MV.
Because this MV is partitioned by
age, it supports queries on theagecolumn:SELECT age, name, birthday FROM cycling.cyclist_by_age WHERE age = 18;Resultage | name | birthday -----+------------------+------------ 18 | Adrien COSTA | 1997-08-19 18 | Bram WELTEN | 1997-03-29 18 | Pascal EENKHOORN | 1997-02-08 (3 rows)
You can create other materialized views based on the same table that use different primary key columns to support other queries, as shown in the following examples.
Create MV based on a non-primary key DATE column
-
Using the example table, create a MV using the
birthdaycolumn (a non-primary key DATE column) as the partition key in the primary key.As with any table, the materialized view must specify the primary key columns.
Because the original table
cyclist_baseusescidas its primary key,cidmust be present in the materialized view’s primary key. In this example,cidis used as a clustering column, not a partition key.The MV is called
cycling.cyclist_by_birthday.CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_birthday AS SELECT age, cid, birthday, name, country FROM cycling.cyclist_base WHERE birthday IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (birthday, cid); -
Verify that the materialized view was created:
DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_birthday;ResultCREATE MATERIALIZED VIEW cycling.cyclist_by_birthday AS SELECT birthday, cid, age, country, name FROM cycling.cyclist_base WHERE birthday IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (birthday, cid) WITH CLUSTERING ORDER BY (cid ASC) AND additional_write_policy = '99PERCENTILE' AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99PERCENTILE'; -
Query using the MV.
Because this MV is partitioned by
birthday, it supports queries based on thebirthdaycolumn:SELECT age, name, birthday FROM cycling.cyclist_by_birthday WHERE birthday = '1987-09-04'Resultage | name | birthday -----+-----------------+------------ 27 | Cristian EGIDIO | 1987-09-04 (1 rows)
Create MV based on a non-primary key TEXT column
-
Using the example table, create a MV using the
countrycolumn (a non-primary key TEXT column) as the partition key in the primary key.As with any table, the materialized view must specify the primary key columns.
Because the original table
cyclist_baseusescidas its primary key,cidmust be present in the materialized view’s primary key. In this example,cidis used as a clustering column, not a partition key.The MV is called
cycling.cyclist_by_country.CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_country AS SELECT age, cid, birthday, name, country FROM cycling.cyclist_base WHERE country IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (country, cid); -
Verify that the materialized view was created:
DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_country;ResultCREATE MATERIALIZED VIEW cycling.cyclist_by_country AS SELECT country, cid, age, birthday, name FROM cycling.cyclist_base WHERE country IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (country, cid) WITH CLUSTERING ORDER BY (cid ASC) AND additional_write_policy = '99PERCENTILE' AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99PERCENTILE'; -
Query using the MV.
Because this MV is partitioned by
country, it supports queries on thecountrycolumn:SELECT age, name, birthday FROM cycling.cyclist_by_country WHERE country = 'Netherlands';Resultage | name | birthday -----+-------------------+------------ 18 | Bram WELTEN | 1997-03-29 28 | Steven KRUIKSWIJK | 1987-06-07 18 | Pascal EENKHOORN | 1997-02-08 (3 rows)
Create MV using a WHERE clause to exclude rows
-
Using the example table, create a MV using the
birthdaycolumn (a non-primary key TEXT column) as the partition key in the primary key, and use aWHEREclause to include only rows with a specific value in thecountrycolumn.As explained in the previous examples, the materialized view must specify the primary key columns, and it must include, at minimum, the primary key columns from the original table. In this example,
cid(from the original table’s primary key) is used as a clustering column, not a partition key.The MV is called
cycling.cyclist_by_birthday_netherlands.CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_birthday_netherlands AS SELECT age, cid, birthday, name, country FROM cycling.cyclist_base WHERE birthday IS NOT NULL AND cid IS NOT NULL AND country = 'Netherlands' PRIMARY KEY (birthday, cid); -
Verify that the materialized view was created:
DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_birthday_netherlands;ResultCREATE MATERIALIZED VIEW cycling.cyclist_by_birthday_netherlands AS SELECT birthday, cid, age, country, name FROM cycling.cyclist_base WHERE birthday IS NOT NULL AND cid IS NOT NULL AND country = 'Netherlands' PRIMARY KEY (birthday, cid) WITH CLUSTERING ORDER BY (cid ASC) AND additional_write_policy = '99PERCENTILE' AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99PERCENTILE'; -
Query using the MV.
Because this MV is partitioned by
birthday, it supports queries on thebirthdaycolumn:SELECT age, name, birthday FROM cycling.cyclist_by_birthday_netherlands WHERE birthday = '1997-02-08';Resultage | name | birthday -----+------------------+------------ 18 | Pascal EENKHOORN | 1997-02-08 (1 rows)
Create MV based on non-primary key DATE and INT columns
-
Using the example table, create a MV using the
birthdaycolumn (a non-primary key DATE column) and theagecolumn (a non-primary key INT column) as the clustering columns in the primary key for the MV.As explained in the previous examples, the materialized view must specify the primary key columns, and it must include, at minimum, the primary key columns from the original table. In this example,
cid(from the original table’s primary key) is still the partition key, andbirthdayandageare used as clustering columns.The MV is called
cycling.cyclist_by_birthday_and_age.CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_birthday_and_age AS SELECT age, cid, birthday, country, name FROM cycling.cyclist_base WHERE age IS NOT NULL AND birthday IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (cid, birthday, age); -
Verify that the materialized view was created:
DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_birthday_and_age;ResultCREATE MATERIALIZED VIEW cycling.cyclist_by_birthday_and_age AS SELECT cid, birthday, age, country, name FROM cycling.cyclist_base WHERE age IS NOT NULL AND birthday IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (cid, birthday, age) WITH CLUSTERING ORDER BY (birthday ASC, age ASC) AND additional_write_policy = '99PERCENTILE' AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99PERCENTILE'; -
Query using the MV.
Clustering columns must be queried in the order they are defined in the primary key.
Based on the primary key for this MV, you can query on the
birthdayandagecolumns:SELECT age, name, birthday FROM cycling.cyclist_by_birthday_and_age WHERE birthday = '1997-02-08'
Create MV based on non-primary key columns with a WHERE clause
-
Using the example table, create a MV using the
birthdayandagecolumns as the clustering columns in the primary key, as well as aWHEREclause.As explained in the previous examples, the materialized view must specify the primary key columns, and it must include, at minimum, the primary key columns from the original table. In this example,
cid(from the original table’s primary key) is still the partition key, andbirthdayandageare used as clustering columns.Additionally, this example uses a
WHEREclause to ensure that the MV includes only rows that meet all of the following conditions:-
birthdayis notNULL -
cidis notNULL -
ageis equal to19
The MV is called
cycling.cyclist_by_birthday_and_age_19.CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_birthday_and_age_19 AS SELECT age, cid, birthday, country, name FROM cycling.cyclist_base WHERE birthday IS NOT NULL AND cid IS NOT NULL AND age = 19 PRIMARY KEY (cid, birthday, age); -
-
Verify that the MV was created:
DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_birthday_and_age_19;ResultCREATE MATERIALIZED VIEW cycling.cyclist_by_birthday_and_age_19 AS SELECT cid, birthday, age, country, name FROM cycling.cyclist_base WHERE birthday IS NOT NULL AND cid IS NOT NULL AND age = 19 PRIMARY KEY (cid, birthday, age) WITH CLUSTERING ORDER BY (birthday ASC, age ASC) AND additional_write_policy = '99PERCENTILE' AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99PERCENTILE'; -
Query using the MV:
SELECT * FROM cycling.cyclist_by_birthday_and_age_19 WHERE birthday = '1998-12-23';Resultage | name | birthday -----+---------------+----------- 19 | Kanden GROVES | 1998-12-23 (1 rows)When querying on multiple clustering columns, these columns must be queried in the order they are defined in the primary key.
Use a materialized view to perform queries that aren’t possible on the original table
The following example demonstrates how to use a materialized view to perform queries that aren’t possible on the original table unless you used ALLOW FILTERING.
ALLOW FILTERING is not recommended because of performance degradation.
This example uses the following table:
CREATE TABLE IF NOT EXISTS cycling.mechanic (
emp_id int,
dept_id int,
name text,
age int,
birthdate date,
PRIMARY KEY (emp_id, dept_id)
);
The table contains these rows:
emp_id | dept_id | age | birthdate | name
--------+---------+-----+------------+------------
5 | 3 | 25 | 1996-10-04 | Lisa SMITH
1 | 1 | 21 | 1992-06-18 | Fred GREEN
2 | 1 | 22 | 1993-01-15 | John SMITH
4 | 2 | 24 | 1995-08-19 | Jack JONES
3 | 2 | 23 | 1994-02-07 | Jane DOE
(5 rows)
Based on the cycling.mechanic table, the following MV includes only rows where all columns are not NULL, and it reconfigures the primary key to support different queries.
In the primary key for the MV, the age column is the partition key, and the original table’s primary key columns, emp_id and dept_id, are used as clustering columns.
CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.mechanic_view AS
SELECT emp_id, dept_id, name, age, birthdate FROM cycling.mechanic
WHERE emp_id IS NOT NULL
AND dept_id IS NOT NULL
AND name IS NOT NULL
AND age IS NOT NULL
AND birthdate IS NOT NULL
PRIMARY KEY (age, emp_id, dept_id);
Queries using the MV only need to specify the partition key for the MV.
For example, the following query succeeds when run on the MV, but it would fail if run on the original table without ALLOW FILTERING:
SELECT * FROM cycling.mechanic_view WHERE age = 21;
age | emp_id | dept_id | birthdate | name
-----+--------+---------+------------+------------
21 | 1 | 1 | 1992-06-18 | Fred GREEN
(1 rows)
Data synchronization between original tables and materialized views
When data is written to a table, the database updates the original table and any associated materialized views.
When data is deleted from a table, the database deletes the same data from any associated materialized views.
CQL queries can only write data directly to the original tables, not to materialized views. You cannot send a write request directly to a MV; you must write to the table, which then triggers an update to the materialized view.
The database updates materialized views asynchronously after inserting data into the original table. Therefore, updates to materialized views are inherently delayed. Read-repair operations on materialized views are performed only after updating the original table.