• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Astra DB Serverless Documentation

    • Overview
      • Release notes
      • Astra DB FAQs
      • Astra DB glossary
      • Get support
    • Getting Started
      • Grant a user access
      • Load and retrieve data
        • Use DSBulk to load data
        • Use Data Loader in Astra Portal
      • Connect a driver
      • Build sample apps
      • Use integrations
    • Planning
      • Plan options
      • Database regions
    • Securing
      • Security highlights
      • Security guidelines
      • Default user permissions
      • Change your password
      • Reset your password
      • Authentication and Authorization
      • Astra DB Plugin for HashiCorp Vault
    • Connecting
      • Connecting private endpoints
        • AWS Private Link
        • Azure Private Link
        • GCP Private Endpoints
        • Connecting custom DNS
      • Connecting Change Data Capture (CDC)
      • Connecting CQL console
      • Connect the Spark Cassandra Connector to Astra
      • Drivers for Astra DB
        • Connecting C++ driver
        • Connecting C# driver
        • Connecting Java driver
        • Connecting Node.js driver
        • Connecting Python driver
        • Drivers retry policies
      • Connecting Legacy drivers
      • Get Secure Connect Bundle
    • Migrating
      • Components
      • FAQs
      • Preliminary steps
        • Feasibility checks
        • Deployment and infrastructure considerations
        • Create target environment for migration
        • Understand rollback options
      • Phase 1: Deploy ZDM Proxy and connect client applications
        • Set up the ZDM Proxy Automation with ZDM Utility
        • Deploy the ZDM Proxy and monitoring
        • Configure Transport Layer Security
        • Connect client applications to ZDM Proxy
        • Leverage metrics provided by ZDM Proxy
        • Manage your ZDM Proxy instances
      • Phase 2: Migrate and validate data
      • Phase 3: Enable asynchronous dual reads
      • Phase 4: Change read routing to Target
      • Phase 5: Connect client applications directly to Target
      • Troubleshooting
        • Troubleshooting tips
        • Troubleshooting scenarios
      • Glossary
      • Contribution guidelines
      • Release Notes
    • Managing
      • Managing your organization
        • User permissions
        • Pricing and billing
        • Audit Logs
        • Bring Your Own Key
          • BYOK AWS Astra DB console
          • BYOK GCP Astra DB console
          • BYOK AWS DevOps API
          • BYOK GCP DevOps API
        • Configuring SSO
          • Configure SSO for Microsoft Azure AD
          • Configure SSO for Okta
          • Configure SSO for OneLogin
      • Managing your database
        • Create your database
        • View your databases
        • Database statuses
        • Use DSBulk to load data
        • Use Data Loader in Astra Portal
        • Monitor your databases
        • Export metrics to third party
          • Export metrics via Astra Portal
          • Export metrics via DevOps API
        • Manage access lists
        • Manage multiple keyspaces
        • Using multiple regions
        • Terminate your database
      • Managing with DevOps API
        • Managing database lifecycle
        • Managing roles
        • Managing users
        • Managing tokens
        • Managing BYOK AWS
        • Managing BYOK GCP
        • Managing access list
        • Managing multiple regions
        • Get private endpoints
        • AWS PrivateLink
        • Azure PrivateLink
        • GCP Private Service
    • Astra CLI
    • Astra Block
      • Quickstart
      • FAQ
      • Data model
      • About NFTs
    • Developing with Stargate APIs
      • Develop with REST
      • Develop with Document
      • Develop with GraphQL
        • Develop with GraphQL (CQL-first)
        • Develop with GraphQL (Schema-first)
      • Develop with gRPC
        • gRPC Rust client
        • gRPC Go client
        • gRPC Node.js client
        • gRPC Java client
      • Develop with CQL
      • Tooling Resources
      • Node.js Document API client
      • Node.js REST API client
    • Stargate QuickStarts
      • Document API QuickStart
      • REST API QuickStart
      • GraphQL API CQL-first QuickStart
    • API References
      • DevOps REST API v2
      • Stargate Document API v2
      • Stargate REST API v2
  • DataStax Astra DB Serverless Documentation
  • Migrating
  • Preliminary steps
  • Deployment and infrastructure considerations

Deployment and infrastructure considerations

Choosing where to deploy the proxy

A typical ZDM Proxy deployment is made up of multiple proxy instances. A minimum of three proxy instances is recommended for any deployment apart from those for demo or local testing purposes.

All ZDM Proxy instances must be reachable by the client application and must be able to connect to your Origin and Target clusters. The ZDM Proxy process is lightweight, requiring only a small amount of resources and no storage to persist state (apart from logs).

The ZDM Proxy should be deployed close to your client application instances. This can be on any cloud provider as well as on-premise, depending on your existing infrastructure.

If you have a multi-DC cluster with multiple set of client application instances deployed to geographically distributed data centers, you should plan for a separate ZDM Proxy deployment for each data center.

Here’s a typical deployment showing connectivity between client applications, ZDM Proxy instances, and clusters:

Connectivity between client applications

Infrastructure requirements

To deploy the ZDM Proxy and its companion monitoring stack, you will have to provision infrastructure that meets the following requirements.

Machines

We will use the term "machine" to indicate a cloud instance (on any cloud provider), a VM, or a physical server.

  • N machines to run the desired number of ZDM Proxy instances:

    • You will need one machine for each ZDM Proxy instance.

    • Requirements for each ZDM Proxy instance:

      • Ubuntu Linux 20.04 or newer, RedHat Family Linux 7 or newer

      • 4 vCPUs

      • 8GB RAM

      • 20GB - 100GB root volume

      • Equivalent to AWS c5.xlarge / GCP e2-standard-4 / Azure A4 v2

  • One machine for the jumphost, which is typically also used as Ansible Control Host and to run the monitoring stack (Prometheus + Grafana):

    • The most common option is using a single machine for all these functions, but you could split these functions across different machines if you prefer.

    • Requirements:

      • Ubuntu Linux 20.04 or newer, RedHat Family Linux 7 or newer

      • 8 vCPUs

      • 16GB RAM

      • 200GB - 500GB storage (depending on the amount of metrics history that you wish to retain)

      • Equivalent to AWS c5.2xlarge / GCP e2-standard-8 / Azure A8 v2

  • 1-M machines to run either DSBulk Migrator or Cassandra Data Migrator.

    • It’s recommended that you start with at least one VM with 16 vCPUs and 64GB RAM and a minimum of 200GB storage. Depending on the total amount of data that is planned for migration, more than one VM may be needed.

    • Requirements:

      • Ubuntu Linux 18.04 or newer

      • 16 vCPUs

      • 64GB RAM

      • 200GB - 2TB storage (if you use dsbulk-migrator to unload multiple terabytes of data from origin, then load into target, you may need to consider more space to accommodate the data that needs to be staged)

      • Equivalent to AWS m5.4xlarge / GCP e2-standard-16 / Azure D16v5

Scenario: when you have close to 12 TBs of data and several tables, to speed up the migration of your existing data, you can run with (for example) 4 machines that are the equivalent of an AWS m5.4xlarge, a GCP e2-standard-16 or an Azure D16v5. Then run DSBulk Migrator on each machine, with each one responsible for a quarter of the full token range. Or leverage the parallelism from a Cassandra Data Migrator Spark job to run the migration process across all 4 machines.

Connectivity

The ZDM Proxy machines must be reachable by:

  • The client application instances, on port 9042

  • The monitoring machine on port 14001

  • The jumphost on port 22

  • Important: the ZDM Proxy machines should not be directly accessible by external machines. The only direct access to these machines should be from the jumphost

The ZDM Proxy machines must be able to connect to the Origin and Target cluster nodes:

  • For self-managed (non-Astra DB) clusters, connectivity is needed to the Cassandra native protocol port (typically 9042)

  • For Astra DB clusters, you will need to ensure outbound connectivity to the Astra endpoint indicated in the Secure Connect Bundle. Connectivity over Private Link is also supported.

The connectivity requirements for the jumphost / monitoring machine are:

  • Connecting to the ZDM Proxy instances: on port 14001 for metrics collection, and on port 22 to run the Ansible automation and for log inspection or troubleshooting

  • Allowing incoming ssh connections from outside, potentially from allowed IP ranges only

  • Exposing the Grafana UI on port 3000

  • Important: it is strongly recommended to restrict external access to this machine to specific IP ranges (for example, the IP range of your corporate networks or trusted VPNs)

The ZDM Proxy and monitoring machines must be able to connect externally, as the automation will download:

  • Various software packages (Docker, Prometheus, Grafana);

  • ZDM Proxy image from DockerHub repo.

Connecting to the ZDM infrastructure from an external machine

To connect to the jumphost from an external machine, ensure that its IP address belongs to a permitted IP range. If you are connecting through a VPN that only intercepts connections to selected destinations, you may have to add a route from your VPN IP gateway to the public IP of the jumphost.

To simplify connecting to the jumphost and, through it, to the ZDM Proxy instances, you can create a custom SSH config file. You can use this template and replace all the placeholders in angle brackets with the appropriate values for your deployment, adding more entries if you have more than three proxy instances. Save this file, for example calling it zdm_ssh_config.

Host <jumphost_private_IP_address> jumphost
  Hostname <jumphost_public_IP_address>
  Port 22

Host <private_IP_address_of_proxy_instance_0> zdm-proxy-0
  Hostname <private_IP_address_of_proxy_instance_0>
  ProxyJump jumphost

Host <private_IP_address_of_proxy_instance_1> zdm-proxy-1
  Hostname <private_IP_address_of_proxy_instance_1>
  ProxyJump jumphost

Host <private_IP_address_of_proxy_instance_2> zdm-proxy-2
  Hostname <private_IP_address_of_proxy_instance_2>
  ProxyJump jumphost

Host *
    User <linux user>
    IdentityFile < Filename (with absolute path) of the locally-generated key pair for the ZDM infrastructure. Example ~/.ssh/zdm-key-XXX >
    IdentitiesOnly yes
    StrictHostKeyChecking no
    GlobalKnownHostsFile /dev/null
    UserKnownHostsFile /dev/null

With this file, you can connect to your jumphost simply with:

ssh -F zdm_ssh_config jumphost

Likewise, connecting to any ZDM Proxy instance is as easy as this (replacing the instance number as desired):

ssh -F zdm_ssh_config zdm-proxy-0
Feasibility checks Create target environment for migration

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage