Materialized views (MVs) examples

Materialized views (MVs) are a feature of Apache Cassandra® that allows you to create a new table that is a copy of an existing table with a different primary key. This enables queries that aren’t possible on the original table without using ALLOW FILTERING.

Prepare to create materialized views

Before you can create a MV, you must create a keyspace and create a table.

To follow along with these examples, create the following keyspace and table, and then load the sample data. The table has a single primary key column (cid), and it holds values for the name, age, birthday, and country affiliation of several cyclists.

Keyspace:

CREATE KEYSPACE IF NOT EXISTS cycling
WITH REPLICATION = {
  'class' : 'SimpleStrategy',
  'replication_factor' : 1
};

Table:

CREATE TABLE IF NOT EXISTS cycling.cyclist_base (
  cid UUID PRIMARY KEY,
  name text,
  age int,
  birthday date,
  country text
);

Rows:

 cid                                  | age | birthday   | country       | name
--------------------------------------+-----+------------+---------------+-------------------
 ffdfa2a7-5fc6-49a7-bfdc-3fcdcfdd7156 |  18 | 1997-02-08 |   Netherlands |  Pascal EENKHOORN
 15a116fc-b833-4da6-ab9a-4a7775752836 |  18 | 1997-08-19 | United States |      Adrien COSTA
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 |  22 | 1993-06-18 |   New Zealand |        Alex FRAME
 c9c9c484-5e4a-4542-8203-8d047a01b8a8 |  27 | 1987-09-04 |        Brazil |   Cristian EGIDIO
 862cc51f-00a1-4d5a-976b-a359cab7300e |  20 | 1994-09-04 |       Denmark |     Joakim BUKDAL
 1c526849-d3a2-42a3-bcf9-7903c80b3d16 |  19 | 1998-12-23 |     Australia |     Kanden GROVES
 18f471bf-f631-4bc4-a9a2-d6f6cf5ea503 |  18 | 1997-03-29 |   Netherlands |       Bram WELTEN
 220844bf-4860-49d6-9a4b-6b5d3a79cbfb |  38 | 1977-07-08 |         Italy |   Paolo TIRALONGO
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |  28 | 1987-06-07 |   Netherlands | Steven KRUIKSWIJK
 95fed1a4-db7e-4c8d-8b2d-f7f9340f2c0b |  20 | 1993-04-09 | United States |        Zam WEATON

(10 rows)

Create MV based on a non-primary key INT column

Using the example table, create a MV using the age column (a non-primary key INT column) as the partition key in the primary key for the MV.

As with any table, the materialized view must specify the primary key columns.

Because the original table cyclist_base uses cid as its primary key, cid must be present in the materialized view’s primary key. In this example, cid is used as a clustering column, not a partition key.

The MV is called cycling.cyclist_by_age.
```
CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_age AS
  SELECT age, cid, birthday, country, name
  FROM cycling.cyclist_base
  WHERE age IS NOT NULL AND cid IS NOT NULL
  PRIMARY KEY (age, cid)
  WITH CLUSTERING ORDER BY (cid ASC)
  AND caching = {
    'keys' : 'ALL',
    'rows_per_partition' : '100'
  }
  AND comment = 'Based on table cyclist_base';
```
Note the following characteristics of this CREATE MATERIALIZED VIEW statement:
- The AS SELECT phrase identifies the columns copied from the original table to the materialized view.
- The FROM phrase identifies the original table from which data is copied.
- The WHERE clause must include all primary key columns with the IS NOT NULL phrase so that only rows with data for all the primary key columns are copied to the materialized view.

To verify that the materialized view was created, use DESCRIBE MATERIALIZED VIEW to get the MV schema:

DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_age;

Result

CREATE MATERIALIZED VIEW cycling.cyclist_by_age AS
    SELECT age, cid, birthday, country, name
    FROM cycling.cyclist_base
    WHERE age IS NOT NULL AND cid IS NOT NULL
    PRIMARY KEY (age, cid)
    WITH CLUSTERING ORDER BY (cid ASC)
    AND additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': '100'}
    AND comment = 'Based on table cyclist_base'
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Query using the MV.

Because this MV is partitioned by age, it supports queries on the age column:

SELECT age, name, birthday FROM cycling.cyclist_by_age WHERE age = 18;

Result

 age | name             | birthday
-----+------------------+------------
  18 |     Adrien COSTA | 1997-08-19
  18 |      Bram WELTEN | 1997-03-29
  18 | Pascal EENKHOORN | 1997-02-08

(3 rows)

You can create other materialized views based on the same table that use different primary key columns to support other queries, as shown in the following examples.

Create MV based on a non-primary key DATE column

Using the example table, create a MV using the birthday column (a non-primary key DATE column) as the partition key in the primary key.

As with any table, the materialized view must specify the primary key columns.

Because the original table cyclist_base uses cid as its primary key, cid must be present in the materialized view’s primary key. In this example, cid is used as a clustering column, not a partition key.

The MV is called cycling.cyclist_by_birthday.
```
CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_birthday
  AS SELECT age, cid, birthday, name, country
  FROM cycling.cyclist_base
  WHERE birthday IS NOT NULL AND cid IS NOT NULL
  PRIMARY KEY (birthday, cid);
```

Verify that the materialized view was created:

DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_birthday;

Result

CREATE MATERIALIZED VIEW cycling.cyclist_by_birthday AS
    SELECT birthday, cid, age, country, name
    FROM cycling.cyclist_base
    WHERE birthday IS NOT NULL AND cid IS NOT NULL
    PRIMARY KEY (birthday, cid)
    WITH CLUSTERING ORDER BY (cid ASC)
    AND additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Query using the MV.

Because this MV is partitioned by birthday, it supports queries based on the birthday column:

SELECT age, name, birthday FROM cycling.cyclist_by_birthday
  WHERE birthday = '1987-09-04'

Result

 age | name            | birthday
-----+-----------------+------------
  27 | Cristian EGIDIO | 1987-09-04

(1 rows)

Create MV based on a non-primary key TEXT column

Using the example table, create a MV using the country column (a non-primary key TEXT column) as the partition key in the primary key.

As with any table, the materialized view must specify the primary key columns.

Because the original table cyclist_base uses cid as its primary key, cid must be present in the materialized view’s primary key. In this example, cid is used as a clustering column, not a partition key.

The MV is called cycling.cyclist_by_country.
```
CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_country
  AS SELECT age, cid, birthday, name, country
  FROM cycling.cyclist_base
  WHERE country IS NOT NULL AND cid IS NOT NULL
  PRIMARY KEY (country, cid);
```

Verify that the materialized view was created:

DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_country;

Result

CREATE MATERIALIZED VIEW cycling.cyclist_by_country AS
    SELECT country, cid, age, birthday, name
    FROM cycling.cyclist_base
    WHERE country IS NOT NULL AND cid IS NOT NULL
    PRIMARY KEY (country, cid)
    WITH CLUSTERING ORDER BY (cid ASC)
    AND additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Query using the MV.

Because this MV is partitioned by country, it supports queries on the country column:

SELECT age, name, birthday FROM cycling.cyclist_by_country
  WHERE country = 'Netherlands';

Result

 age | name              | birthday
-----+-------------------+------------
  18 |       Bram WELTEN | 1997-03-29
  28 | Steven KRUIKSWIJK | 1987-06-07
  18 |  Pascal EENKHOORN | 1997-02-08

(3 rows)

Create MV using a `WHERE` clause to exclude rows

Using the example table, create a MV using the birthday column (a non-primary key TEXT column) as the partition key in the primary key, and use a WHERE clause to include only rows with a specific value in the country column.

As explained in the previous examples, the materialized view must specify the primary key columns, and it must include, at minimum, the primary key columns from the original table. In this example, cid (from the original table’s primary key) is used as a clustering column, not a partition key.

The MV is called cycling.cyclist_by_birthday_netherlands.
```
CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_birthday_netherlands AS
  SELECT age, cid, birthday, name, country
  FROM cycling.cyclist_base
  WHERE birthday IS NOT NULL AND cid IS NOT NULL
    AND country = 'Netherlands'
  PRIMARY KEY (birthday, cid);
```

Verify that the materialized view was created:

DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_birthday_netherlands;

Result

CREATE MATERIALIZED VIEW cycling.cyclist_by_birthday_netherlands AS
    SELECT birthday, cid, age, country, name
    FROM cycling.cyclist_base
    WHERE birthday IS NOT NULL AND cid IS NOT NULL AND country = 'Netherlands'
    PRIMARY KEY (birthday, cid)
    WITH CLUSTERING ORDER BY (cid ASC)
    AND additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Query using the MV.

Because this MV is partitioned by birthday, it supports queries on the birthday column:

SELECT age, name, birthday FROM cycling.cyclist_by_birthday_netherlands
  WHERE birthday = '1997-02-08';

Result

 age | name             | birthday
-----+------------------+------------
  18 | Pascal EENKHOORN | 1997-02-08

(1 rows)

Create MV based on non-primary key DATE and INT columns

Using the example table, create a MV using the birthday column (a non-primary key DATE column) and the age column (a non-primary key INT column) as the clustering columns in the primary key for the MV.

As explained in the previous examples, the materialized view must specify the primary key columns, and it must include, at minimum, the primary key columns from the original table. In this example, cid (from the original table’s primary key) is still the partition key, and birthday and age are used as clustering columns.

The MV is called cycling.cyclist_by_birthday_and_age.
```
CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_birthday_and_age AS
  SELECT age, cid, birthday, country, name
  FROM cycling.cyclist_base
  WHERE age IS NOT NULL AND birthday IS NOT NULL AND cid IS NOT NULL
  PRIMARY KEY (cid, birthday, age);
```

Verify that the materialized view was created:

DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_birthday_and_age;

Result

CREATE MATERIALIZED VIEW cycling.cyclist_by_birthday_and_age AS
    SELECT cid, birthday, age, country, name
    FROM cycling.cyclist_base
    WHERE age IS NOT NULL AND birthday IS NOT NULL AND cid IS NOT NULL
    PRIMARY KEY (cid, birthday, age)
    WITH CLUSTERING ORDER BY (birthday ASC, age ASC)
    AND additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Query using the MV.

Clustering columns must be queried in the order they are defined in the primary key.

Based on the primary key for this MV, you can query on the birthday and age columns:
```
SELECT age, name, birthday FROM cycling.cyclist_by_birthday_and_age
  WHERE birthday = '1997-02-08'
```

Create MV based on non-primary key columns with a `WHERE` clause

Using the example table, create a MV using the birthday and age columns as the clustering columns in the primary key, as well as a WHERE clause.

As explained in the previous examples, the materialized view must specify the primary key columns, and it must include, at minimum, the primary key columns from the original table. In this example, cid (from the original table’s primary key) is still the partition key, and birthday and age are used as clustering columns.

Additionally, this example uses a WHERE clause to ensure that the MV includes only rows that meet all of the following conditions:
- birthday is not NULL
- cid is not NULL
- age is equal to 19
The MV is called cycling.cyclist_by_birthday_and_age_19.
```
CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.cyclist_by_birthday_and_age_19 AS
  SELECT age, cid, birthday, country, name FROM cycling.cyclist_base
  WHERE birthday IS NOT NULL
    AND cid IS NOT NULL
    AND age = 19
  PRIMARY KEY (cid, birthday, age);
```

Verify that the MV was created:

DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_birthday_and_age_19;

Result

CREATE MATERIALIZED VIEW cycling.cyclist_by_birthday_and_age_19 AS
    SELECT cid, birthday, age, country, name
    FROM cycling.cyclist_base
    WHERE birthday IS NOT NULL AND cid IS NOT NULL AND age = 19
    PRIMARY KEY (cid, birthday, age)
    WITH CLUSTERING ORDER BY (birthday ASC, age ASC)
    AND additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

Query using the MV:

SELECT * FROM cycling.cyclist_by_birthday_and_age_19
  WHERE birthday = '1998-12-23';

Result

 age | name          | birthday
-----+---------------+-----------
  19 | Kanden GROVES | 1998-12-23

(1 rows)

When querying on multiple clustering columns, these columns must be queried in the order they are defined in the primary key.

Use a materialized view to perform queries that aren’t possible on the original table

The following example demonstrates how to use a materialized view to perform queries that aren’t possible on the original table unless you used ALLOW FILTERING. ALLOW FILTERING is not recommended because of performance degradation.

This example uses the following table:

CREATE TABLE IF NOT EXISTS cycling.mechanic (
  emp_id int,
  dept_id int,
  name text,
  age int,
  birthdate date,
  PRIMARY KEY (emp_id, dept_id)
);

The table contains these rows:

 emp_id | dept_id | age | birthdate  | name
--------+---------+-----+------------+------------
      5 |       3 |  25 | 1996-10-04 | Lisa SMITH
      1 |       1 |  21 | 1992-06-18 | Fred GREEN
      2 |       1 |  22 | 1993-01-15 | John SMITH
      4 |       2 |  24 | 1995-08-19 | Jack JONES
      3 |       2 |  23 | 1994-02-07 |   Jane DOE

(5 rows)

Based on the cycling.mechanic table, the following MV includes only rows where all columns are not NULL, and it reconfigures the primary key to support different queries. In the primary key for the MV, the age column is the partition key, and the original table’s primary key columns, emp_id and dept_id, are used as clustering columns.

CREATE MATERIALIZED VIEW IF NOT EXISTS cycling.mechanic_view AS
  SELECT emp_id, dept_id, name, age, birthdate FROM cycling.mechanic
  WHERE emp_id IS NOT NULL
    AND dept_id IS NOT NULL
    AND name IS NOT NULL
    AND age IS NOT NULL
    AND birthdate IS NOT NULL
  PRIMARY KEY (age, emp_id, dept_id);

Queries using the MV only need to specify the partition key for the MV. For example, the following query succeeds when run on the MV, but it would fail if run on the original table without ALLOW FILTERING:

SELECT * FROM cycling.mechanic_view WHERE age = 21;

Result

 age | emp_id | dept_id | birthdate  | name
-----+--------+---------+------------+------------
  21 |      1 |       1 | 1992-06-18 | Fred GREEN

(1 rows)

Data synchronization between original tables and materialized views

When data is written to a table, the database updates the original table and any associated materialized views.

When data is deleted from a table, the database deletes the same data from any associated materialized views.

CQL queries can only write data directly to the original tables, not to materialized views. You cannot send a write request directly to a MV; you must write to the table, which then triggers an update to the materialized view.

The database updates materialized views asynchronously after inserting data into the original table. Therefore, updates to materialized views are inherently delayed. Read-repair operations on materialized views are performed only after updating the original table.

Materialized views (MVs) examples

Prepare to create materialized views

Create MV based on a non-primary key INT column

Create MV based on a non-primary key DATE column

Create MV based on a non-primary key TEXT column

Create MV using a `WHERE` clause to exclude rows

Create MV based on non-primary key DATE and INT columns

Create MV based on non-primary key columns with a `WHERE` clause

Use a materialized view to perform queries that aren’t possible on the original table

Data synchronization between original tables and materialized views

Was this helpful?

Give Feedback