Prepare to use materialized views

A materialized view is a table built from data in another table with a new primary key and new properties. Queries are optimized by the primary key definition. Standard practice is to create a table for the query, and create a new table with the same data if a different query is needed. Client applications then manually update the additional tables as well as the original. In the materialized view, data is updated automatically by changes to the source table.

Materialized views are not available for Astra DB Serverless.

Propagation of updates to materialized views

The following steps illustrate how Cassandra and DataStax Enterprise propagate updates from a original table to its materialized views:

The coordinator node receives an update from a client for the original table and forwards it to the configured replica nodes.
1. When the cassandra.mv_enable_coordinator_batchlog property is enabled, the coordinator will write a batchlog to QUORUM nodes containing the original table write before forwarding them to the replicas. This configuration provides better protection against a coordinator failing in the middle of a request, but slows the view write operation considerably.
Upon receiving an update from the coordinator for the original table, each replica node completes the following tasks:
1. Generate view updates for each materialized view of the original table.
  - A local read is completed in the original table row to determine if a previous view row must be removed or modified.
  - A local lock is acquired on the original table partition when generating the view update to ensure that the view updates are serialized. This lock is released after updates to the view are propagated to the replicas and base updates are applied locally.
2. After generating view updates, deterministically compute its paired view replica for each view update, so that the view replication work is distributed among base replicas.
  - If the base replica is also a view replica, the base replica chooses itself as the paired view replica, and applies the view update synchronously.
  - Otherwise, the update is written synchronously to the local batchlog for durability, and sent asynchronously to the remote paired view replica.
3. Acknowledge the write to the coordinator node.
4. After receiving an acknowledgement of all asynchronous paired view writes, remove the local batchlog. Otherwise, replay the batchlog at a later time to propagate the view update to the replica. If a replica is down during batchlog replay, one hint is written for each mutation.
After receiving an acknowledgement from all nodes (based on consistency level), the coordinator node returns a successfully write response to the client.

Performance considerations

Materialized views allow fast lookup of data using the normal read path. However, materialized views do not have the same write performance as normal table writes because the database performs an additional read-before-write operation to update each materialized view. To complete an update, the database performs a data consistency check on each replica. A write to the source table incurs latency (~10% for each materialized view), and the performance of deletes on the source table also suffers.

If a delete on the source table affects two or more contiguous rows, this delete is tagged with one tombstone. However, these same rows may not be contiguous in materialized views derived from the source table. If they are not, the database creates multiple tombstones in the materialized views.

Additional work is required to ensure that all correct state changes to a given row are applied to materialized views, especially regarding concurrent updates. By using materialized views, performance is traded for data correctness.

Consistency considerations

Each original table replica writes the view updates locally (when it is also a view replica), or writes a local batchlog before returning the original table write (as described in 2.b). If the original table replica cannot update a remote view during the write operation, the replica retries the update during batchlog replay. This mechanism ensures that all changes to each original table replica are reflected in the views, unless data loss occurs in the original table replica.

The write operation for the view replica is asynchronous to ensure availability is not compromised. A consequence is that a read operation for a view might not immediately see a successful write to the original table until the write operation is propagated by the base replicas. Under normal conditions, data is quickly made available in the views. Use the ViewWriteMetrics metric to track the view propagation time.

Scenario that can result in base-view inconsistency

In an ordinary CQL table, when a row is successfully written to consistency level replicas, data loss can occur if those replicas become permanently unavailable before the update is propagated to the remaining replicas. The following example illustrates this scenario.

Write to a table with a replication factor of three (RF=3) and a consistency level of ONE.
The base replica is also the coordinator node.
The coordinator responds to the client that the write was successful.
The machine hosting the coordinator node dies.

In the case of materialized views, the previous example carries additional implications. If the original table (coordinator node) successfully wrote the view update to another node, the row will exist only in the view but not in the original table, creating an orphaned view row.

Another scenario that can create an orphaned view row is when a original table row loses all replicas without repair between failures. If a view row loses its replicas, the original table row will not have its corresponding view row.

To avoid those situations, write to original tables with materialized views using consistency levels greater than ONE (such as LOCAL_QUORUM). Alternatively, use the -Dmv_enable_coordinator_batchlog=true option to provide better protection against a coordinator failing in the middle of a request.

Using the -Dmv_enable_coordinator_batchlog=true option will slow the view write operation considerably.

Known limitations of materialized views

The following limitations are known for materialized views:

Currently, there is no way to automatically detect and fix permanent inconsistency between the base and the view (CASSANDRA-10346)
Incremental repair is not supported on original tables with materialized views (CASSANDRA-12888)

Frequently asked questions about materialized views

Can materialized views be used in production environments?: Before using materialized views, be aware of the known limitations and test them against your application requirements to determine if materialized views are suitable for your environment.

After materialized views are deployed, regular maintenance repairs are required to ensure that original tables and views are consistent. Provided that limitations are validated against the application and consistency considerations are observed, materialized views can be deployed in production environments.
Is manual denormalization better than using materialized views?: This choice depends on the use case and requirements. Ensuring consistency between views and tables in the face of complex failures and concurrent updates requires additional mechanisms (such as row locking, view repair, and paired view replication), which requires extra work. In practice, no guarantees are lost when using built-in materialized views versus manually denormalized tables.

One differentiator of doing manual denormalization versus using materialized views is when consistency is less important, or data is never updated or deleted. In these instances, write to multiple tables from the client rather than using materialized views.
What is the most recommended DSE version for using materialized views?: DataStax recommends using the most recent version of DSE 6.x to get the most stable version of materialized views. Several bugs fixes and performance improvements were added for materialized views in these versions.