Troubleshooting scenarios

Refer the following troubleshooting scenarios for information about resolving common migration issues. Each section presents:

  • Symptoms

  • Cause

  • Solution or Workaround

Configuration changes are not being applied by the automation


You changed the values of some configuration variables in the automation and then rolled them out using the rolling_update_zdm_proxy.yml playbook, but these changes are not taking effect on your ZDM Proxy instances.


The ZDM Proxy configuration comprises a number of variables, but only a subset of these can be changed on an existing deployment in a rolling fashion. The variables that can be changed with a rolling update are listed here.

All other configuration variables excluded from the list above are considered immutable and can only be changed by a redeployment. This is by design: immutable configuration variables should not be changed after finalizing the deployment prior to starting the migration, so allowing them to be changed through a rolling update would risk accidentally propagating some misconfiguration that could compromise the deployment’s integrity.

Solution or Workaround

To change the value of configuration variables that are considered immutable, simply run the deploy_zdm_proxy.yml playbook again. This playbook can be run as many times as necessary and will just recreate the entire ZDM Proxy deployment from scratch with the provided configuration. Please note that this does not happen in a rolling fashion: the existing ZDM Proxy instances will be torn down all at the same time prior to being recreated, resulting in a brief window in which the whole ZDM Proxy deployment will become unavailable.

Unsupported protocol version error on the client application


In the Java 4.x driver logs, the following issues can manifest during session initialization, or after initialization.

[s0|/] Fatal error while initializing pool, forcing the node down (UnsupportedProtocolVersionException: [/] Host does not support protocol version DSE_V2)

[s0|/] Fatal error while initializing pool, forcing the node down (UnsupportedProtocolVersionException: [/] Host does not support protocol version DSE_V2)

[s0|/] Fatal error while initializing pool, forcing the node down (UnsupportedProtocolVersionException: [/] Host does not support protocol version DSE_V2)

[s0] Failed to connect with protocol DSE_V1, retrying with V4

[s0] Failed to connect with protocol DSE_V2, retrying with DSE_V1


JAVA-2905 is a driver bug that manifests itself in this way. It affects Java driver 4.x, and was fixed on the 4.10.0 release.

Solution or Workaround

If you are using spring boot and/or spring-data-cassandra then an upgrade of these dependencies will be necessary to a version that has the java driver fix.

Alternatively, you can force the protocol version on the driver to the max supported version by both clusters. V4 is a good recommendation that usually fits all but if the user is migrating from DSE to DSE then DSE_V1 should be used for DSE 5.x and DSE_V2 should be used for DSE 6.x.

To force the protocol version on the Java driver, check this section of the driver manual. We don’t believe this issue affects Java driver 3.x, but here are the instructions on how to force the version on 3.x, if necessary.

Protocol errors in the proxy logs but clients can connect successfully


ZDM Proxy logs contain:

{"log":"time=\"2022-10-01T12:02:12Z\" level=debug msg=\"[TARGET-CONNECTOR] Protocol v5 detected while decoding a frame.
Returning a protocol error to the client to force a downgrade: ERROR PROTOCOL ERROR (code=ErrorCode ProtocolError [0x0000000A],
msg=Invalid or unsupported protocol version (5)).\"\n","stream":"stderr","time":"2022-07-20T12:02:12.379287735Z"}


Protocol errors like these are a normal part of the handshake process where the protocol version is being negotiated. These protocol version downgrades happen when either the ZDM Proxy or at least one of the clusters doesn’t support the version requested by the client.

V5 downgrades are enforced by the ZDM Proxy but any other downgrade is requested by one of the clusters when they don’t support the version that the client requested. The proxy supports V3, V4, DSE_V1 and DSE_V2.

Solution or Workaround

These log messages are informative only (log level DEBUG).

If you find one of these messages with a higher log level (especially level=error) then there might be a bug. At that point the issue will need to be investigated by the ZDM team. This log message with a log level of ERROR means that the protocol error occurred after the handshake, and this is a fatal unexpected error that results in a disconnect for that particular connection.

Error during proxy startup: Invalid or unsupported protocol version: 3


ZDM Proxy logs contain:

time="2022-10-01T19:58:15+01:00" level=info msg="Starting proxy..."
time="2022-10-01T19:58:15+01:00" level=info msg="Parsed Topology Config: TopologyConfig{VirtualizationEnabled=false, Addresses=[], Count=1, Index=0, NumTokens=8}"
time="2022-10-01T19:58:15+01:00" level=info msg="Parsed Origin contact points: []"
time="2022-10-01T19:58:15+01:00" level=info msg="Parsed Target contact points: []"
time="2022-10-01T19:58:15+01:00" level=info msg="TLS was not configured for Origin"
time="2022-10-01T19:58:15+01:00" level=info msg="TLS was not configured for Target"
time="2022-10-01T19:58:15+01:00" level=info msg="[openTCPConnection] Opening connection to"
time="2022-10-01T19:58:15+01:00" level=info msg="[openTCPConnection] Successfully established connection with"
time="2022-10-01T19:58:15+01:00" level=debug msg="performing handshake"
time="2022-10-01T19:58:15+01:00" level=error msg="cqlConn{conn:}: handshake failed: expected AUTHENTICATE or READY, got ERROR PROTOCOL ERROR (code=ErrorCode ProtocolError [0x0000000A], msg=Invalid or unsupported protocol version: 3)"
time="2022-10-01T19:58:15+01:00" level=warning msg="Error while initializing a new cql connection for the control connection of ORIGIN: failed to perform handshake: expected AUTHENTICATE or READY, got ERROR PROTOCOL ERROR (code=ErrorCode ProtocolError [0x0000000A], msg=Invalid or unsupported protocol version: 3)"
time="2022-10-01T19:58:15+01:00" level=debug msg="Shutting down request loop on cqlConn{conn:}"
time="2022-10-01T19:58:15+01:00" level=debug msg="Shutting down response loop on cqlConn{conn:}."
time="2022-10-01T19:58:15+01:00" level=debug msg="Shutting down event loop on cqlConn{conn:}."
time="2022-10-01T19:58:15+01:00" level=error msg="Couldn't start proxy: failed to initialize origin control connection: could not open control connection to ORIGIN, tried endpoints: []."
time="2022-10-01T19:58:15+01:00" level=info msg="Initiating proxy shutdown..."
time="2022-10-01T19:58:15+01:00" level=debug msg="Requesting shutdown of the client listener..."
time="2022-10-01T19:58:15+01:00" level=debug msg="Requesting shutdown of the client handlers..."
time="2022-10-01T19:58:15+01:00" level=debug msg="Waiting until all client handlers are done..."
time="2022-10-01T19:58:15+01:00" level=debug msg="Requesting shutdown of the control connections..."
time="2022-10-01T19:58:15+01:00" level=debug msg="Waiting until control connections done..."
time="2022-10-01T19:58:15+01:00" level=debug msg="Shutting down the schedulers and metrics handler..."
time="2022-10-01T19:58:15+01:00" level=info msg="Proxy shutdown complete."
time="2022-10-01T19:58:15+01:00" level=error msg="Couldn't start proxy, retrying in 2.229151525s: failed to initialize origin control connection: could not open control connection to ORIGIN, tried endpoints: []."


The control connections of the ZDM Proxy don’t perform protocol version negotiation, they only attempt to use protocol version 3. If one of the origin clusters doesn’t support at least V3 (e.g. Cassandra 2.0, DSE 4.6), then ZDM cannot be used for that migration at the moment. We plan to introduce support for Cassandra 2.0 and DSE 4.6 very soon.

Authentication errors


{"log":"\u001b[33mWARN\u001b[0m[0110] Secondary (TARGET) handshake failed with an auth error, returning ERROR AUTHENTICATION ERROR (code=ErrorCode AuthenticationError [0x00000100], msg=We recently improved your database security. To find out more and reconnect, see to client. \r\n","stream":"stdout","time":"2022-09-06T18:31:31.348472345Z"}


Credentials are incorrect or have insufficient permissions.

There are three sets of credentials in play with ZDM:

  • Target: credentials that you set in the proxy configuration through the ZDM_TARGET_USERNAME and ZDM_TARGET_PASSWORD settings.

  • Origin: credentials that you set in the proxy configuration through the ZDM_ORIGIN_USERNAME and ZDM_ORIGIN_PASSWORD settings.

  • Client: credentials that the client application sends to the proxy during the connection handshake, these are set in the application configuration, not the proxy configuration.

This error means that at least one of these three sets of credentials is incorrect or has insufficient permissions.

Solution or Workaround

If the authentication error is preventing the proxy from starting then it’s either the Origin or Target credentials that are incorrect or have insufficient permissions. The log message shows whether it is the Target or Origin handshake that is failing.

If the proxy is able to start up — that is, this message can be seen in the logs:

Proxy started. Waiting for SIGINT/SIGTERM to shutdown.

then the authentication error is happening when a client application tries to open a connection to the proxy. In this case, the issue is with the Client credentials so the application itself is using invalid credentials (incorrect username/password or insufficient permissions).

Note that the proxy startup message has log level INFO so if the configured log level on the proxy is warning or error, you will have to rely on other ways to know whether the ZDM Proxy started correctly. You can check if the docker container is running (or process if docker isn’t being used) or if there is a log message similar to Error launching proxy.

The ZDM Proxy listens on a custom port, and all applications are able to connect to one proxy instance only


The ZDM Proxy is listening on a custom port (not 9042) and:

  • The Grafana dashboard shows only one proxy instance receiving all the connections from the application.

  • Only one proxy instance has log messages such as level=info msg="Accepted connection from".


The application is specifying the custom port as part of the contact points using the format <proxy_ip_address>:<proxy_custom_port>.

For example, using the Java driver, if the ZDM Proxy instances were listening on port 14035, this would look like:

.addContactPoints("", "", "")

The contact point is used as the first point of contact to the cluster, but the driver discovers the rest of the nodes via CQL queries. However, this discovery process doesn’t discover the ports, just the addresses so the driver uses the addresses it discovers with the port that is configured at startup.

As a result, port 14035 will only be used for the contact point initially discovered, while for all other nodes the driver will attempt to use the default 9042 port.

Solution or Workaround

In the application, ensure that the custom port is explicitly indicated using the .withPort(<customPort>) API. In the above example:

.addContactPoints("", "", "")

Syntax error "no viable alternative at input 'CALL'" in proxy logs


ZDM Proxy logs contain:

{"log":"time=\"2022-10-01T13:10:47Z\" level=debug msg=\"Recording TARGET-CONNECTOR other error:
ERROR SYNTAX ERROR (code=ErrorCode SyntaxError [0x00002000], msg=line 1:0 no viable alternative
at input 'CALL' ([CALL]...))\"\n","stream":"stderr","time":"2022-07-20T13:10:47.322882877Z"}


The log message indicates that the server doesn’t recognize the word “CALL” in the query string which most likely means that it is an RPC (remote procedure call). From the proxy logs alone, it is not possible to see what method is being called by the query but it’s very likely the RPC that the drivers use to send DSE Insights data to the server.

Most DataStax drivers have DSE Insights reporting enabled by default when they detect a server version that supports it (regardless of whether the feature is enabled on the server side or not). The driver might also have it enabled for Astra DB depending on what server version Astra DB is returning for queries involving the system.local and system.peers tables.

Solution or Workaround

These log messages are harmless but if your need to get rid of them, you can disable the DSE Insights driver feature through the driver configuration. Refer to this property for Java driver 4.x.

Default Grafana credentials don’t work


Consider a case where you deploy the metrics component of our ZDM Proxy Automation, a Grafana instance is deployed but you cannot login using the usual default admin/admin credentials.


The ZDM Proxy Automation specifies a custom set of credentials instead of relying on the admin/admin ones that are typically the default for Grafana deployments.

Solution or Workaround

Check the credentials that are being used by looking up the vars/zdm_monitoring_config.yml file on the ZDM Proxy Automation directory. These credentials can also be modified before deploying the metrics stack.

Proxy starts but client cannot connect (connection timeout/closed)


ZDM Proxy log contains:

INFO[0000] [openTCPConnection] Opening connection to
INFO[0000] [openTCPConnection] Successfully established connection with
INFO[0000] [openTLSConnection] Opening TLS connection to using underlying TCP connection
INFO[0000] [openTLSConnection] Successfully established connection with
INFO[0000] Successfully opened control connection to ORIGIN using endpoint
INFO[0000] [openTCPConnection] Opening connection to
INFO[0000] [openTCPConnection] Successfully established connection with
INFO[0000] [openTLSConnection] Opening TLS connection to 211d66bf-de8d-48ac-a25b-bd57d504bd7c using underlying TCP connection
INFO[0000] [openTLSConnection] Successfully established connection with 211d66bf-de8d-48ac-a25b-bd57d504bd7
INFO[0000] Successfully opened control connection to TARGET using endpoint
INFO[0000] Proxy connected and ready to accept queries on
INFO[0000] Proxy started. Waiting for SIGINT/SIGTERM to shutdown.
INFO[0043] Accepted connection from
INFO[0043] [ORIGIN-CONNECTOR] Opening request connection to ORIGIN (
ERRO[0043] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 100ms...
ERRO[0043] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 200ms...
ERRO[0043] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 400ms...
ERRO[0043] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 800ms...
ERRO[0044] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 1.6s...
ERRO[0046] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 3.2s...
ERRO[0049] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 6.4s...
ERRO[0056] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 10s...
ERRO[0066] [openTCPConnectionWithBackoff] Couldn't connect to, retrying in 10s...
ERRO[0076] Client Handler could not be created: ORIGIN-CONNECTOR context timed out or cancelled while opening connection to ORIGIN: context deadline exceeded


ZDM Proxy has connectivity only to a subset of the nodes.

The control connection (during ZDM Proxy startup) cycles through the nodes until it finds one that can be connected to. For client connections, each proxy instance cycles through its "assigned nodes" only. (The "assigned nodes" are a different subset of the cluster nodes for each proxy instance, generally non-overlapping between proxy instances so as to avoid any interference with the load balancing already in place at client-side driver level. The assigned nodes are not necessarily contact points: even discovered nodes undergo assignment to proxy instances.)

In the example above, the ZDM Proxy doesn’t have connectivity to, which was chosen as the origin node for the incoming client connection, but it was able to connect to during startup.

Solution or Workaround

Ensure that network connectivity exists and is stable between the ZDM Proxy instances and all Cassandra / DSE nodes of the local datacenter.

Client application driver takes too long to reconnect to a proxy instance


After a ZDM Proxy has been unavailable for some time and it gets back up, the client application takes too long to reconnect.

There should never be a reason to stop a ZDM Proxy instance other than a configuration change but maybe the proxy crashed or the user tried to do a configuration change and took a long time to get the ZDM Proxy back up.


The ZDM Proxy does not send topology events to the client applications so the time it takes for the driver to reconnect to a ZDM Proxy instance is determined by the reconnection policy.

Solution or Workaround

Restart the client application to force an immediate reconnect.

If you expect ZDM Proxy instances to go down frequently, change the reconnection policy on the driver so that the interval between reconnection attempts has a shorter limit.

Error with Astra DevOps API when using the ZDM Proxy Automation


ZDM Proxy Automation’s logs:

fatal: []: FAILED! => {"changed": false, "elapsed": 0, "msg": "Status code was -1 and not [200]:
Connection failure: Remote end closed connection without response", "redirected": false, "status": -1, "url":


The Astra DevOps API is likely temporarily unavailable.

Solution or Workaround

Download the Astra DB Secure Connect Bundle (SCB) manually and provide its path to the ZDM Proxy Automation as explained here. For information about the SCB, see working with Secure Connect Bundle.

Metadata service (Astra) returned not successful status code 4xx or 5xx


The ZDM Proxy doesn’t start and the following appears on the proxy logs:

Couldn't start proxy: error initializing the connection configuration or control connection for Target:
metadata service (Astra) returned not successful status code


There are two possible causes for this:

  1. The credentials that the ZDM Proxy is using for Astra DB don’t have sufficient permissions.

  2. The Astra database is hibernated.

Solution or Workaround

Start by opening Astra Portal and checking the Status of your database. If it is Hibernated, click the “Resume” button and wait for it to become Active. If it is Active already, then it is likely an issue with permissions.

We recommend starting with a token that has the Database Administrator role in Astra DB to confirm that it is a permissions issue. Refer to Manage user permissions.

Async read timeouts / stream id map exhausted


Dual reads are enabled and the following messages are found in the ZDM Proxy logs:

{"log":"\u001b[33mWARN\u001b[0m[430352] Async Request (OpCode EXECUTE [0x0A]) timed out after 10000 ms. \r\n","stream":"stdout","time":"2022-10-03T17:29:42.548941854Z"}

{"log":"\u001b[33mWARN\u001b[0m[430368] Could not find async request context for stream id 331 received from async connector. It either timed out or a protocol error occurred. \r\n","stream":"stdout","time":"2022-10-03T17:29:58.378080933Z"}

{"log":"\u001b[33mWARN\u001b[0m[431533] Could not send async request due to an error while storing the request state: stream id map ran out of stream ids: channel was empty. \r\n","stream":"stdout","time":"2022-10-03T17:49:23.786335428Z"}


The last log message is logged when the async connection runs out of stream ids. The async connection is a connection dedicated to the async reads (asynchronous dual reads feature). This can be caused by timeouts (first log message) or the connection not being able to keep up with the load.

If the log files are being spammed with these messages then it is likely that an outage occurred which caused all responses to arrive after requests timed out (second log message). In this case the async connection might not be able to recover.

Solution or Workaround

Keep in mind that any errors in the async request path (dual reads) will not affect the client application so these log messages might be useful to predict what may happen when the reads are switched over to the TARGET cluster but async read errors/warnings by themselves do not cause any impact to the client.

Starting in version 2.1.0, you can now tune the maximum number of stream ids available per connection, which by default is 2048. You can increase it to match your driver configuration through the zdm_proxy_max_stream_ids property.

If these errors are being constantly written to the log files (for minutes or even hours) then it is likely that only an application OR ZDM Proxy restart will fix it. If you find an issue like this please submit an Issue in our GitHub repo.

Client application closed connection errors every 10 minutes when migrating to Astra DB

This issue is fixed in ZDM Proxy 2.1.0. See the Fix section below.


Every 10 minutes a message is logged in the ZDM Proxy logs showing a disconnect that was caused by Astra DB.

{"log":"\u001b[36mINFO\u001b[0m[426871] [TARGET-CONNECTOR] REDACTED disconnected \r\n","stream":"stdout","time":"2022-10-01T16:31:41.48598498Z"}


Astra DB terminates idle connections after 10 minutes of inactivity. If a client application is only sending reads through a connection then the Target (i.e. Astra in this case) connection will not get any traffic because ZDM forwards all reads to the Origin connection.

Solution or Workaround

This issue has been fixed in ZDM Proxy 2.1.0. We encourage you to upgrade to that version or greater. By default, ZDM Proxy now sends heartbeats after 30 seconds of inactivity on a cluster connection, to keep it alive. You can tune the heartbeat interval with the Ansible configuration variable heartbeat_insterval_ms, or by directly setting the ZDM_HEARTBEAT_INTERVAL_MS environment variable if you do not use the ZDM Proxy Automation.

Performance degradation with ZDM


Consider a case where a user runs separate benchmarks against:

  • Astra DB directly

  • Origin directly

  • ZDM (with Astra DB and Origin)

The results of these tests show latency/throughput values are worse with ZDM than when connecting to Astra DB or Origin directly.


ZDM will always add additional latency which, depending on the nature of the test, will also result in a lower throughput. Whether this performance hit is expected or not depends on the difference between the ZDM test results and the test results with the cluster that performed the worst.

Writes in ZDM require an ACK from both clusters while reads only require the result from the Origin cluster (or target if the proxy is set up to route reads to the target cluster). This means that if Origin has better performance than Target then ZDM will inevitably have a worse performance for writes.

From our testing benchmarks, a performance degradation of up to 2x latency is not unheard of even without external factors adding more latency, but it is still worth checking some things that might add additional latency like whether the proxy is deployed on the same Availability Zone (AZ) as the Origin cluster or application instances.

Simple statements and batch statements are things that will make the proxy add additional latency compared to normal prepared statements. Simple statements should be discouraged especially with the ZDM Proxy because currently the proxy takes a considerable amount of time just parsing the queries and with prepared statements the proxy only has to parse them once.

Solution or Workaround

If you are using simple statements, consider using prepared statements as the best first step.

Increasing the number of proxies might help, but only if the VMs resources (CPU, RAM or network IO) are near capacity. The ZDM Proxy doesn’t use a lot of RAM, but it uses a lot of CPU and network IO.

Deploying the proxy instances on VMs with faster CPUs and faster network IO might help, but only your own tests will reveal whether it helps, because it depends on the workload type and details about your environment such as network/VPC configurations, hardware, and so on.


ZDM Proxy logs contain:

time="2023-05-05T19:14:31Z" level=debug msg="Recording ORIGIN-CONNECTOR other error: ERROR UNAUTHORIZED (code=ErrorCode Unauthorized [0x00002100], msg=User my_user has no EXECUTE permission on <rpc method InsightsRpc.reportInsight> or any of its parents)"
time="2023-05-05T19:14:31Z" level=debug msg="Recording TARGET-CONNECTOR other error: ERROR SERVER ERROR (code=ErrorCode ServerError [0x00000000], msg=Unexpected persistence error: Unable to authorize statement com.datastax.bdp.cassandra.cql3.RpcCallStatement)"


This could be the case if the origin (DSE) cluster has Metrics Collector enabled to report metrics for DataStax drivers and my_user does not have the required permissions. ZDM Proxy simply passes through these.

Solution or Workaround

There are two options to get this fixed.

Option 1: Disable DSE Metrics Collector

  • On the origin DSE cluster, run dsetool insights_config --mode DISABLED

  • Run dsetool insights_config --show_config and ensure the mode has a value of DISABLED

Option 2: Use this option if disabling metrics collector is not an option

  • Using a superuser role, grant the appropriate permissions to my_user role by running GRANT EXECUTE ON REMOTE OBJECT InsightsRpc TO my_user;

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000,