DSE Advanced Replication metrics

DSE Advanced Replication metrics on the edge nodes refer to the current status of that node in the edge cluster.

Collect metrics on each edge node to review the current status of that node in the edge cluster. A working edge and hub configuration is required to use the metrics feature. See Getting started.

Ensure JMX access 

Metrics are stored in the DataStax Enterprise Cassandra JMX system. JMX access is required.
  • For production, DataStax recommends using JMX authentication.
  • Use these steps to enable local JMX access. Localhost access is useful for test and development.
  1. On the edge, edit cassandra-env.sh and enable local JMX:
    JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=localhost"
    LOCAL_JMX=yes
  2. On the edge, stop and restart DataStax Enterprise to recognize the local JMX change.

Display metrics on the command line 

Use the dse advrep command line tool to display metrics on the command line. Ensure that the edge node meets the command line prerequisites.
  1. On the edge node:
    dse advrep --port 7199 -v edge metrics
    Group                  | Type                              | Count
    ------------------------------------------------------------------
    ReplicationLogConsumer | Errors                            | 0
    Tables                 | MessagesDelivered                 | 4
    ReplicationChannel     | ReloadErrors                      | 0
    ReplicationChannel     | FastTrackPermitAcquisitionsDenied | 0
    ReplicationChannel     | DeliveryErrors                    | 0
    Tables                 | MessagesReceived                  | 4
    Tables                 | MessagesInReplicationLog          | 0
    ReplicationLog         | MessageAddErrors                  | 0
    ReplicationChannel     | SlowTrackPermitAcquisitionsDenied | 0
    
    Group | Type | Count | RateUnit | MeanRate | FifteenMinuteRate | Count | OneMinuteRate | ...
    
    Group                  | Type                          | 75thPercentile     | DurationUnit |...
    --------------------------------------------------------------------------------------------
    ReplicationChannel     | FastTrackEdgeToHubMessages    | 268650.95          | microseconds |...
    ReplicationChannel     | SlowTrackChannelToHubMessages | 30130.992000000002 | microseconds |...
    ReplicationChannel     | SlowTrackEdgeToHubMessages    | 5.3142810146E7     | microseconds |...
    Trigger                | ProcessedMessages             | 2346.799           | microseconds |...
    ReplicationLogConsumer | PeekingTimes                  | 152.321            | microseconds |...
    ReplicationChannel     | TotalEdgeToHubMessages        | 4.4285675122E7     | microseconds |...
    ReplicationChannel     | FastTrackChannelToHubMessages | 62479.625          | microseconds |...
    ReplicationChannel     | TotalChannelToHubMessages     | 30130.992000000002 | microseconds |...
    

Accessing the metrics 

Use JMX to access the metrics. Use this rule:
com.datastax.bdp.advrep.metrics:type=Group_No_Spaces,name=Name_No_Spaces
For example, to access the DeliveryErrors metric in the ReplicationChannel group:
com.datastax.bdp.advrep.metrics:type=ReplicationChannel,name=DeliveryErrors

Performance metrics

Metrics are exposed as JMX Mbeans under the com.datastax.bdp.advrep.metrics path and are logically divided into main groups. Each group refers to an architecture component. Metrics types are:
Counter
A simple incrementing and decrementing 64-bit integer.
Meter
Measures the rate at which a set of events occur.
Histogram
Measures the distribution of values in a stream of data.
Timer
A histogram of the duration of a type of event and a meter of the rate of its occurrence.
Metrics are available for the following groups: Descriptions of each metric is provided.

ReplicationChannel 

Metrics for the ReplicationChannel group:
Metric name Description Metric type
DeliveryErrors The number of errors that occurred when delivering messages to the hub through the Cassandra driver. Counter
FastTrackChannelToHubMessages For messages replicated to the hub via fast track, tracks the time spent from when the replication channel receives the message to when the message is correctly replicated to the driver via fast track. Timer
FastTrackEdgeToHubMessages For messages replicated to the hub via fast track, tracks the time spent from when the edge cluster received the message to when the message is correctly replicated to the driver via fast track. Timer
FastTrackPermitAcquisitionDenied The number of times the replication channel did not obtain a permit for sending a message to the hub via fast track because permits were not available at that time. Counter
ReloadErrors The number of errors that occurred when the replication channel was reloaded. Counter
SlowTrackChannelToHubMessages For messages replicated to the hub via slow track, tracks the time spent from when the replication channel receives the message to when the message is correctly replicated to the driver via slow track. Timer
SlowTrackEdgeToHubMessages For messages replicated to the hub via slow track, tracks the time spent from when the edge cluster received the message to when the message is correctly replicated to the driver via slow track. Timer
SlowTrackPermitAcquisitionDenied The number of times the replication channel did not obtain a permit for sending a message to the hub via slow track because permits were not available at that time. Counter
TotalChannelToHubMessages For all messages that were replicated to the hub, tracks the time spent from when the replication channel receives the message to when the message is correctly replicated to the driver via fast track or slow track. Timer
TotalEdgeToHubMessages For all messages replicated to the hub, tracks the time spent from when the edge cluster received the message to when the message has been correctly replicated to the driver via fast track or slow track. Timer

ReplicationLog 

Metrics for the ReplicationLog group:
Metric name Description Metric type
MessageAddErrors The number of errors that occurred when adding a message to the replication log. Counter
MessagesAcknowledged The number of messages that were acknowledged (and removed) from the replication log. Meter
MessagesAdded The number of messages that were added to the replication log, and the rate that the messages were added. Meter
MessagesReleased The number of messages in the replication log that were released from suspended status.

When fast track messages are added to the replication log, they are marked as suspended to ensure that these messages are not also replicated to the slow track. When a fast track message is released, the suspended status is removed so that the message can be replicated to the slow track. Messages that are released are usually messages that were not correctly replicated to the hub with the fast track, so a retry to the slow track is necessary.

Meter
MessagesRemoved The number of messages that were removed from the replication log, including acknowledged messages and messages that were removed after a truncate operation. Meter
MessagesSize The size of the messages received by the edge cluster and added to the replication log Histogram

ReplicationLogConsumer 

Metrics for the ReplicationLogConsumer group:
Metric name Description Metric type
Errors The number of errors occurred when consuming messages from the replication log. Counter
PeekedMessages The number of messages that were consumed from the replication log. Meter
PeekingTimes The time spent to consume a bunch of messages from the replication log. Timer

Trigger 

Metrics for the Trigger group:
Metric name Description Metric type
ProcessedMessages The number of messages that were received by the edge cluster for tables that were enabled for replication. The time spent for processing a message (for example, to add it to the replication log) and return the control to the client. Timer

AdvancedReplicationHub-metrics 

Metrics for the AdvancedReplicationHub-metrics group are provided automatically by the Cassandra Java driver. Incomplete examples are:
Metric name Metric type
known-hosts Counter
connected-to Counter
open-connections Counter
requests-timer Timer
connection-errors Counter
write-timeouts Counter
read-timeouts Counter
unavailables Counter
other-errors Counter
retries Counter
ignores Counter
For details, see the Java driver documentation.

Performance metrics per table 

Use JMX to find performance metrics per table, use this rule:
com.datastax.bdp.advrep.metrics:type=Tables,scope=keyspace.table,name=Name_No_Spaces
For example, to access the MessagesDelivered metric for the table sensor_readings in the keyspace demo look at the following path:
com.datastax.bdp.advrep.metrics:type=Tables,scope=demo.sensor_readings,name=MessagesDelivered
The following metrics are provided per table:
Metric name Description Metric type
MessagesDelivered The number of messages for this table that were replicated to the hub. Counter
MessagesInReplicationLog The estimated number of messages currently in the replication log for this table. This is the difference between MessagesReceived and MessagesDelivered since the time this node was started. This counter can be negative if:
  • There are messages in the replication log when the node is started.
  • Messages were delivered to the hub multiple times.
Counter
MessagesReceived The number of messages received from the edge cluster for this table Counter
The location of the cassandra-env.sh file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra-env.sh
Tarball installations install_location/resources/cassandra/conf/cassandra-env.sh