Manage your ZDM Proxy instances

In this topic, we’ll learn how to perform simple operations on your ZDM Proxy deployment with no interruption to its availability:

Do a simple rolling restart of the ZDM Proxy instances
View or collect the logs of all ZDM Proxy instances
Change a mutable configuration variable
Upgrade the ZDM Proxy version

With ZDM Proxy Automation, you can use Ansible playbooks for all of these operations.

Perform a rolling restart of the proxies

Rolling restarts of the ZDM Proxy instances are useful to apply configuration changes or to upgrade the ZDM Proxy version without impacting the availability of the deployment.

With ZDM Proxy Automation
Without ZDM Proxy Automation

If you use ZDM Proxy Automation to manage your ZDM Proxy deployment, you can use a dedicated playbook to perform rolling restarts of all ZDM Proxy instances in a deployment:

Connect to the Ansible Control Host Docker container. You can do this from the jumphost machine by running the following command:
```
docker exec -it zdm-ansible-container bash
```
Result
ubuntu@52772568517c:~$
Run the rolling restart playbook:
```
ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory
```
While running, this playbook gracefully stops one container and waits for it to shut down before restarting the container. Then, it calls the readiness endpoint to check the container’s status:
- If the check fails, the playbook repeats the check every five seconds for a maximum of six attempts. If all six attempts fail, the playbook interrupts the entire rolling restart process.
- If the check succeeds, the playbook waits before proceeding to the next container.
  
  The default pause between containers is 10 seconds. You can change the pause duration in zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml.

If you don’t use ZDM Proxy Automation, you must manually restart each instance.

To avoid downtime, wait for each instance to fully restart and begin receiving traffic before restarting the next instance.

Access the proxy logs

To confirm that the ZDM Proxy instances are operating normally, or investigate any issue, you can view or collect their logs.

You can view the logs for a single proxy instance, or you can use a playbook to systematically retrieve logs from all instances and package them in a zip archive for later inspection.

View the logs

ZDM Proxy runs as a Docker container on each proxy host. Its logs can be viewed by connecting to a proxy host and running the following command.

docker container logs zdm-proxy-container

To leave the logs open and continuously output the latest log messages, append the --follow (or -f) option to the command above.

Collect the logs

ZDM Proxy Automation has a dedicated playbook, collect_zdm_proxy_logs.yml, that you can use to collect logs for all ZDM Proxy instances in a deployment.

You can view the playbook’s configuration in vars/zdm_proxy_log_collection_config.yml, but no changes are required to run it.

Connect to the Ansible Control Host Docker container. You can do this from the jumphost machine by running the following command:
```
docker exec -it zdm-ansible-container bash
```
Result
ubuntu@52772568517c:~$
Run the log collection playbook:
```
ansible-playbook collect_zdm_proxy_logs.yml -i zdm_ansible_inventory
```
This playbook creates a single zip file, zdm_proxy_logs_TIMESTAMP.zip, that contains the logs from all proxy instances. This archive is stored on the Ansible Control Host Docker container at /home/ubuntu/zdm_proxy_archived_logs.
To copy the archive from the container to the jumphost, open a shell on the jumphost, and then run the following command:
```
docker cp zdm-ansible-container:/home/ubuntu/zdm_proxy_archived_logs/zdm_proxy_logs_TIMESTAMP.zip DESTINATION_DIRECTORY_ON_JUMPHOST
```
Replace the following:
- TIMESTAMP: The timestamp from the name of your log file archive
- DESTINATION_DIRECTORY_ON_JUMPHOST: The path to the directory where you want to copy the archive

Change a mutable configuration variable

The following configuration variables are considered mutable and can be changed in a rolling fashion on an existing ZDM Proxy deployment.

Commonly changed variables, located in vars/zdm_proxy_core_config.yml:

primary_cluster:
- This variable determines which cluster is currently considered the primary cluster. At the start of the migration, the primary cluster is the origin cluster because it contains all of the data. In Phase 4 of the migration, once all the existing data has been transferred and any validation/reconciliation step has been successfully executed, you can switch the primary cluster to be the target cluster.
- Valid values: ORIGIN, TARGET.
read_mode:
- This variable determines how reads are handled by ZDM Proxy.
- Valid values:
  - PRIMARY_ONLY: reads are only sent synchronously to the primary cluster. This is the default behavior.
  - DUAL_ASYNC_ON_SECONDARY: reads are sent synchronously to the primary cluster and also asynchronously to the secondary cluster. See Enable asynchronous dual reads.
- Typically, when choosing DUAL_ASYNC_ON_SECONDARY you will want to ensure that primary_cluster is still set to ORIGIN. When you are ready to use the target cluster as the primary cluster, revert read_mode to PRIMARY_ONLY.
log_level:
- Defaults to INFO.
- Only set to DEBUG if necessary and revert to INFO as soon as possible, as the extra logging can have a slight performance impact.

Other, rarely changed variables:

Origin username/password in vars/zdm_proxy_cluster_config.yml
Target username/password in vars/zdm_proxy_cluster_config.yml
Advanced configuration variables in vars/zdm_proxy_advanced_config.yml:
- zdm_proxy_max_clients_connections:
  - Maximum number of client connections that ZDM Proxy should accept. Each client connection results in additional cluster connections and causes the allocation of several in-memory structures, so this variable can be tweaked to cap the total number on each instance. A high number of client connections per proxy instance may cause some performance degradation, especially at high throughput.
  - Defaults to 1000.
- replace_cql_functions:
  - Whether ZDM Proxy should replace standard CQL function calls in write requests with a value computed at proxy level.
  - Currently, only the replacement of now() is supported.
  - Boolean value. Disabled by default. Enabling this will have a noticeable performance impact.
- zdm_proxy_request_timeout_ms:
  - Global timeout (in ms) of a request at proxy level.
  - This variable determines how long ZDM Proxy will wait for one cluster (in case of reads) or both clusters (in case of writes) to reply to a request. If this timeout is reached, ZDM Proxy will abandon that request and no longer consider it as pending, thus freeing up the corresponding internal resources. Note that, in this case, ZDM Proxy will not return any result or error: when the client application’s own timeout is reached, the driver will time out the request on its side.
  - Defaults to 10000 ms. If your client application has a higher client-side timeout because it is expected to generate requests that take longer to complete, you need to increase this timeout accordingly.
- origin_connection_timeout_ms and target_connection_timeout_ms:
  - Timeout (in ms) when attempting to establish a connection from the proxy to the origin or the target.
  - Defaults to 30000 ms.
- async_handshake_timeout_ms:
  - Timeout (in ms) when performing the initialization (handshake) of a proxy-to-secondary cluster connection that will be used solely for asynchronous dual reads.
  - If this timeout occurs, the asynchronous reads will not be sent. This has no impact on the handling of synchronous requests: ZDM Proxy will continue to handle all synchronous reads and writes normally.
  - Defaults to 4000 ms.
- heartbeat_interval_ms:
  - Frequency (in ms) with which heartbeats will be sent on cluster connections (i.e. all control and request connections to the origin and the target). Heartbeats keep idle connections alive.
  - Defaults to 30000 ms.
- metrics_enabled:
  - Whether metrics collection should be enabled.
  - Boolean value. Defaults to true, but can be set to false to completely disable metrics collection. This is not recommended.
- zdm_proxy_max_stream_ids:
  - In the CQL protocol every request has a unique id, named stream id. This variable allows you to tune the maximum pool size of the available stream ids managed by ZDM Proxy per client connection. In the application client, the stream ids are managed internally by the driver, and in most drivers the max number is 2048 (the same default value used in the proxy). If you have a custom driver configuration with a higher value, you should change this property accordingly.
  - Defaults to 2048.

Deprecated variables, which will be removed in a future ZDM Proxy release:

forward_client_credentials_to_origin:
- Whether the credentials provided by the client application are for the origin cluster.
- Boolean value. Defaults to false (the client application is expected to pass the target credentials), can be set to true if the client passes credentials for the origin cluster instead.

To change any of these variables, edit the desired values in vars/zdm_proxy_core_config.yml, vars/zdm_proxy_cluster_config.yml (credentials only) and/or vars/zdm_proxy_advanced_config.yml (mutable variables only, as listed above).

To apply the configuration changes to the ZDM Proxy instances in a rolling fashion, run the following command:

ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory

This playbook operates by recreating each proxy container one by one. The ZDM Proxy deployment remains available at all times and can be safely used throughout this operation. The playbook automates the following steps:

It stops one container gracefully, waiting for it to shut down.

It recreates the container and starts it up.

A configuration change is a destructive action because containers are considered immutable. Note that this will remove the previous container and its logs. Make sure you collect the logs prior to this operation if you want to keep them.

It checks that the container has come up successfully by checking the readiness endpoint:
1. If unsuccessful, it repeats the check for six times at 5-second intervals and eventually interrupts the whole process if the check still fails.
2. If successful, it waits for 10 seconds and then moves on to the next container.

The pause between the restart of each ZDM Proxy instance defaults to 10 seconds. To change this value, you can set the desired number of seconds in zdm-proxy-automation/ansible/vars/zdm_playbook_internal_config.yml.

All configuration variables that are not listed in this section are considered immutable and can only be changed by recreating the deployment.

If you wish to change any of the immutable configuration variables on an existing deployment, you will need to re-run the deployment playbook (deploy_zdm_proxy.yml, as documented in this page). This playbook can be run as many times as necessary.

Be aware that running the deploy_zdm_proxy.yml playbook results in a brief window of unavailability of the whole ZDM Proxy deployment while all the ZDM Proxy instances are torn down and recreated.

Upgrade the proxy version

The ZDM Proxy version is displayed at startup, in a message such as Starting ZDM Proxy version …. It can also be retrieved at any time by using the version option as in the following command.

Example:

docker run --rm datastax/zdm-proxy:<version> -version

Here’s an example for ZDM Proxy 2.1.x:

docker run --rm datastax/zdm-proxy:2.1.x -version

The playbook for configuration changes can also be used to upgrade the ZDM Proxy version in a rolling fashion. All containers will be recreated with the image of the specified version. The same behavior and observations as above apply here.

To perform an upgrade, change the version tag number to the desired version in vars/zdm_proxy_container.yml:

zdm_proxy_image: datastax/zdm-proxy:x.y.z

Replace x.y.z with the version you would like to upgrade to.

ZDM Proxy example:

zdm_proxy_image: datastax/zdm-proxy:2.1.0

Then run the same playbook as above, with the following command:

ansible-playbook rolling_update_zdm_proxy.yml -i zdm_ansible_inventory

Scale operations with ZDM Proxy Automation

ZDM Proxy Automation doesn’t provide a way to scale operations up or down in a rolling fashion. If you are using ZDM Proxy Automation and you need a larger ZDM Proxy deployment, you have two options:

Recommended: Create a new deployment

This is the recommended way to scale your ZDM Proxy deployment because it requires no downtime.

With this option, you create a new ZDM Proxy deployment, and then move your client application to it:

Create a new ZDM Proxy deployment with the desired topology on a new set of machines.
Change the contact points in the application configuration so that the application instances point to the new ZDM Proxy deployment.
Perform a rolling restart of the application instances to apply the new contact point configuration.

The rolling restart ensures there is no interruption of service. The application instances switch seamlessly from the old deployment to the new one, and they are able to serve requests immediately.
After restarting all application instances, you can safely remove the old ZDM Proxy deployment.

Add instances to an existing deployment

This option requires some manual effort and a brief amount of downtime.

With this option, you change the topology of your existing ZDM Proxy deployment, and then restart the entire deployment to apply the change:

Amend the inventory file so that it contains one line for each machine where you want to deploy a ZDM Proxy instance.

For example, if you want to add three nodes to a deployment with six nodes, then the amended inventory file must contain nine total IPs, including the six existing IPs and the three new IPs.
Run the deploy_zdm_proxy.yml playbook to apply the change and start the new instances.

Rerunning the playbook stops the existing instances, destroys them, and then creates and starts a new deployment with new instances based on the amended inventory. This results in a brief interruption of service for your entire ZDM Proxy deployment.

Scale ZDM Proxy without ZDM Proxy Automation

If you aren’t using ZDM Proxy Automation, you can still add and remove ZDM Proxy instances.

Add an instance

Prepare and configure the new ZDM Proxy instances appropriately based on your other instances.

Make sure the new instance’s configuration references all planned ZDM Proxy cluster nodes.
On all ZDM Proxy instances, add the new instance’s address to the ZDM_PROXY_TOPOLOGY_ADDRESSES environment variable.

Make sure to include all new nodes.
On the new ZDM Proxy instance, set the ZDM_PROXY_TOPOLOGY_INDEX to the next sequential integer after the greatest one in your existing deployment.
Perform a rolling restart of all ZDM Proxy instances, one at a time.

Vertically scale existing instances

Use these steps to increase or decrease resources for existing ZDM Proxy instances, such as CPU or memory. To avoid downtime, perform the following steps on one instance at a time:

Stop the first ZDM Proxy instance that you want to modify.
Modify the instance’s resources as required.

Make sure the instance’s IP address remains the same. If the IP address changes, you need to treat it as a new instance.
Restart the modified ZDM Proxy instance.
Wait until the instance starts, and then confirm that it is receiving traffic.
Repeat these steps to modify each additional instance, one at a time.

Remove an instance

On all ZDM Proxy instances, remove the unused instance’s address from the ZDM_PROXY_TOPOLOGY_ADDRESSES environment variable.
Perform a rolling restart of all remaining ZDM Proxy instances.
Clean up resources used by the removed instance, such as the container or VM.

Purpose of proxy topology addresses

When you configure a ZDM Proxy deployment, either through ZDM Proxy Automation or manually-managed ZDM Proxy instances, you specify the addresses of your instances. These are populated in the ZDM_PROXY_TOPOLOGY_ADDRESSES variable, either manually or automatically depending on how you manage your instances.

Cassandra drivers look up nodes on a cluster by querying the system.peers table. ZDM Proxy uses the topology addresses to effectively respond to the driver’s request for connection nodes. If there are no topology addresses specified, ZDM Proxy defaults to a single-instance configuration. This means that driver connections will use only that one ZDM Proxy instance, rather than all instances in your ZDM Proxy deployment.

If that one instance goes down, ZDM Proxy won’t know that there are other instances available, and your application can experience an outage. Additionally, if you need to restart ZDM Proxy instances, and there is only one instance specified in the topology addresses, your migration will have downtime while that one instance restarts.