Understanding materialized views
Learn how DSEAstra propagates updates from a base table to its materialized views.
Learn how DataStax Enterprise (DSE)Astra propagates updates from a base table to its materialized views, and consider the performance impacts and consistency requirements.
How materialized views work
The following steps illustrate how DSEAstra propagates updates from a base table to its materialized views.
-
The coordinator node receives an update from a client for the base table and forwards it to the configured replica nodes.
-
When the
cassandra.mv_enable_coordinator_batchlog
property is enabled, the coordinator will write a batchlog to QUORUM nodes containing the base table write before forwarding them to the replicas. This configuration provides better protection against a coordinator failing in the middle of a request, but slows the view write operation considerably. See CASSANDRA-10230 for more information about the batchlog coordinator.
-
-
Upon receiving an update from the coordinator for the base table, each replica node completes the following tasks:
-
Generate view updates for each materialized view of the base table.
-
A local read is completed in the base table row to determine if a previous view row must be removed or modified.
-
A local lock is acquired on the base table partition when generating the view update to ensure that the view updates are serialized. This lock is released after updates to the view are propagated to the replicas and base updates are applied locally.
-
-
After generating view updates, deterministically compute its paired view replica for each view update, so that the view replication work is distributed among base replicas.
-
If the base replica is also a view replica, the base replica chooses itself as the paired view replica, and applies the view update synchronously.
-
Otherwise, the update is written synchronously to the local batchlog for durability, and sent asynchronously to the remote paired view replica.
-
-
Acknowledge the write to the coordinator node.
-
After receiving an acknowledgement of all asynchronous paired view writes, remove the local batchlog. Otherwise, replay the batchlog at a later time to propagate the view update to the replica. If a replica is down during batchlog replay, one hint is written for each mutation.
-
-
After receiving an acknowledgement from all nodes (based on consistency level), the coordinator node returns a successfully write response to the client.
For additional information on how materialized views work, see the following posts on the DataStax Developer Blog.
Performance considerations
Materialized views allow fast lookup of data using the normal read path. However, materialized views do not have the same write performance as normal table writes because the database performs an additional read-before-write operation to update each materialized view. To complete an update, the database performs a data consistency check on each replica. A write to the source table incurs latency (~10% for each materialized view), and the performance of deletes on the source table also suffers.
If a delete on the source table affects two or more contiguous rows, this delete is tagged with one tombstone. However, these same rows may not be contiguous in materialized views derived from the source table. If they are not, the database creates multiple tombstones in the materialized views.
Additional work is required to ensure that all correct state changes to a given row are applied to materialized views, especially regarding concurrent updates. By using materialized views, performance is traded for data correctness.
Consistency considerations
Each base table replica writes the view updates locally (when it is also a view replica), or writes a local batchlog before returning the base table write (as described in https://docs.datastax.com/en/dse/6.8/cql/cql/cql_using/understanding-materialized-views.html#understandMVwriteViewUpdate[/en/dse/6.8/cql/cql/cql_using/understanding-materialized-views.html#understandMVwriteViewUpdate]). If the base table replica cannot update a remote view during the write operation, the replica retries the update during batchlog replay. This mechanism ensures that all changes to each base table replica are reflected in the views, unless data loss occurs in the base table replica.
The write operation for the view replica is asynchronous to ensure availability is not compromised.
A consequence is that a read operation for a view might not immediately see a successful write to the base table until the write operation is propagated by the base replicas.
Under normal conditions, data is quickly made available in the views.
Use the ViewWriteMetrics
metric to track the view propagation time.
Scenario that can result in base-view inconsistency
In an ordinary DSEAstra table, when a row is successfully written to consistency level replicas, data loss can occur if those replicas become permanently unavailable before the update is propagated to the remaining replicas. The following example illustrates this scenario.
-
Write to a table with a replication factor of three (RF=3) and a consistency level of ONE.
-
The base replica is also the coordinator node.
-
The coordinator responds to the client that the write was successful.
-
The machine hosting the coordinator node dies.
In the case of materialized views, the previous example carries additional implications. If the base table (coordinator node) successfully wrote the view update to another node, the row will exist only in the view but not in the base table, creating an orphaned view row.
Another scenario that can create an orphaned view row is when a base table row loses all replicas without repair between failures. If a view row loses its replicas, the base table row will not have its corresponding view row.
To avoid those situations, write to base tables with materialized views using consistency levels greater than ONE (such as LOCAL_QUORUM).
Alternatively, use the -Dmv_enable_coordinator_batchlog=true
option to provide better protection against a coordinator failing in the middle of a request.
Using the |
Related information