DSBulk Migrator overview
Use DSBulk Migrator to perform small or simple migrations that don’t require data validation other than post-migration row counts. This tool is also an option for migrations where you can shard data from large tables into more manageable quantities.
DSBulk Migrator extends DSBulk Loader with the following commands:
-
migrate-live
: Start a live data migration using the embedded version of DSBulk Loader or your own DSBulk Loader installation. A live migration means that the data migration starts immediately and is performed by the migrator tool through the specified DSBulk Loader installation. -
generate-script
: Generate a migration script that you can execute to perform a data migration with a your own DSBulk Loader installation. This command doesn’t trigger the migration; it only generates the migration script that you must then execute. -
generate-ddl
: Read the schema from origin, and then generate CQL files to recreate it in your target Astra DB database.
DSBulk Migrator prerequisites
-
Java 11
-
Maven 3.9.x
-
Optional: If you don’t want to use the embedded DSBulk Loader that is bundled with DSBulk Migrator, install DSBulk Loader before installing DSBulk Migrator.
Build DSBulk Migrator
-
Clone the DSBulk Migrator repository:
cd ~/github git clone git@github.com:datastax/dsbulk-migrator.git cd dsbulk-migrator
-
Use Maven to build DSBulk Migrator:
mvn clean package
The build produces two distributable fat jars:
-
dsbulk-migrator-VERSION-embedded-driver.jar
contains an embedded Java driver. Suitable for script generation or live migrations using an external DSBulk Loader.This jar isn’t suitable for live migrations that use the embedded DSBulk Loader because no DSBulk Loader classes are present.
-
dsbulk-migrator-VERSION-embedded-dsbulk.jar
contains an embedded DSBulk Loader and an embedded Java driver. Suitable for all operations. Much larger than the other JAR due to the presence of DSBulk Loader classes.
Test DSBulk Migrator
The DSBulk Migrator project contains some integration tests that require Simulacron.
-
Clone and build Simulacron, as explained in the Simulacron GitHub repository. Note the prerequisites for Simulacron, particularly for macOS.
-
Run the tests:
mvn clean verify
Run DSBulk Migrator
Launch DSBulk Migrator with the command and options you want to run:
java -jar /path/to/dsbulk-migrator.jar { migrate-live | generate-script | generate-ddl } [OPTIONS]
The role and availability of the options depends on the command you run:
-
During a live migration, the options configure DSBulk Migrator and establish connections to the clusters.
-
When generating a migration script, most options become default values in the generated scripts. However, even when generating scripts, DSBulk Migrator still needs to access the origin cluster to gather metadata about the tables to migrate.
-
When generating a DDL file, import options and DSBulk Loader-related options are ignored. However, DSBulk Migrator still needs to access the origin cluster to gather metadata about the keyspaces and tables for the DDL statements.
For more information about the commands and their options, see the following references:
For help and examples, see Get help with DSBulk Migrator and DSBulk Migrator examples.
Live migration command-line options
The following options are available for the migrate-live
command.
Most options have sensible default values and do not need to be specified, unless you want to override the default value.
|
|
The external DSBulk Loader command to use.
Ignored if the embedded DSBulk Loader is being used.
The default is simply |
|
|
The directory where data will be exported to and imported from.
The default is a |
|
|
Use the embedded DSBulk Loader version instead of an external one. The default is to use an external DSBulk Loader command. |
|
The path to a secure connect bundle to connect to the origin cluster, if that cluster is a DataStax Astra DB cluster.
Options |
|
|
The consistency level to use when exporting data.
The default is |
|
|
An extra DSBulk Loader option to use when exporting.
Any valid DSBulk Loader option can be specified here, and it will passed as is to the DSBulk Loader process.
DSBulk Loader options, including driver options, must be passed as |
|
|
The host name or IP and, optionally, the port of a node from the origin cluster.
If the port is not specified, it will default to |
|
|
The maximum number of concurrent files to write to.
Must be a positive number or the special value |
|
|
The maximum number of concurrent queries to execute.
Must be a positive number or the special value |
|
|
The maximum number of records to export for each table.
Must be a positive number or |
|
|
The password to use to authenticate against the origin cluster.
Options |
|
|
The maximum number of token range queries to generate.
Use the |
|
|
The username to use to authenticate against the origin cluster.
Options |
|
|
|
Displays this help text. |
|
The path to a Secure Connect Bundle (SCB) to connect to a target Astra DB cluster.
Options |
|
|
The consistency level to use when importing data.
The default is |
|
|
The default timestamp to use when importing data.
Must be a valid instant in ISO-8601 syntax.
The default is |
|
|
An extra DSBulk Loader option to use when importing.
Any valid DSBulk Loader option can be specified here, and it will passed as is to the DSBulk Loader process.
DSBulk Loader options, including driver options, must be passed as |
|
|
The host name or IP and, optionally, the port of a node on the target cluster.
If the port is not specified, it will default to |
|
|
The maximum number of concurrent files to read from.
Must be a positive number or the special value |
|
|
The maximum number of concurrent queries to execute.
Must be a positive number or the special value |
|
|
The maximum number of failed records to tolerate when importing data.
The default is |
|
|
The password to use to authenticate against the target cluster.
Options |
|
|
The username to use to authenticate against the target cluster. Options |
|
|
|
A regular expression to select keyspaces to migrate. The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. |
|
|
The directory where the DSBulk Loader should store its logs.
The default is a |
|
The maximum number of concurrent operations (exports and imports) to carry.
The default is |
|
|
Skip truncate confirmation before actually truncating tables. Only applicable when migrating counter tables, ignored otherwise. |
|
|
|
A regular expression to select tables to migrate.
The default is to migrate all tables in the keyspaces that were selected for migration with |
|
The table types to migrate.
The default is |
|
|
Truncate tables before the export instead of after. The default is to truncate after the export. Only applicable when migrating counter tables, ignored otherwise. |
|
|
|
The directory where |
Script generation command-line options
The following options are available for the generate-script
command.
Most options have sensible default values and do not need to be specified, unless you want to override the default value.
|
|
The DSBulk Loader command to use.
The default is simply |
|
|
The directory where data will be exported to and imported from.
The default is a |
|
The path to a secure connect bundle to connect to the origin cluster, if that cluster is a DataStax Astra DB cluster.
Options |
|
|
The consistency level to use when exporting data.
The default is |
|
|
An extra DSBulk Loader option to use when exporting.
Any valid DSBulk Loader option can be specified here, and it will passed as is to the DSBulk Loader process.
DSBulk Loader options, including driver options, must be passed as |
|
|
The host name or IP and, optionally, the port of a node from the origin cluster.
If the port is not specified, it will default to |
|
|
The maximum number of concurrent files to write to.
Must be a positive number or the special value |
|
|
The maximum number of concurrent queries to execute.
Must be a positive number or the special value |
|
|
The maximum number of records to export for each table.
Must be a positive number or |
|
|
The password to use to authenticate against the origin cluster.
Options |
|
|
The maximum number of token range queries to generate.
Use the |
|
|
The username to use to authenticate against the origin cluster.
Options |
|
|
|
Displays this help text. |
|
The path to a Secure Connect Bundle to connect to a target Astra DB cluster.
Options |
|
|
The consistency level to use when importing data.
The default is |
|
|
The default timestamp to use when importing data.
Must be a valid instant in ISO-8601 syntax.
The default is |
|
|
An extra DSBulk Loader option to use when importing.
Any valid DSBulk Loader option can be specified here, and it will passed as is to the DSBulk Loader process.
DSBulk Loader options, including driver options, must be passed as |
|
|
The host name or IP and, optionally, the port of a node on the target cluster.
If the port is not specified, it will default to |
|
|
The maximum number of concurrent files to read from.
Must be a positive number or the special value |
|
|
The maximum number of concurrent queries to execute.
Must be a positive number or the special value |
|
|
The maximum number of failed records to tolerate when importing data.
The default is |
|
|
The password to use to authenticate against the target cluster.
Options |
|
|
The username to use to authenticate against the target cluster.
Options |
|
|
|
A regular expression to select keyspaces to migrate. The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. |
|
|
The directory where DSBulk Loader should store its logs.
The default is a |
|
|
A regular expression to select tables to migrate.
The default is to migrate all tables in the keyspaces that were selected for migration with |
|
The table types to migrate. The default is |
DDL generation command-line options
The following options are available for the generate-ddl
command.
Most options have sensible default values and do not need to be specified, unless you want to override the default value.
|
|
Produce CQL scripts optimized for DataStax Astra DB. Astra DB does not allow some options in DDL statements. Using this DSBulk Migrator command option, forbidden Astra DB options will be omitted from the generated CQL files. |
|
|
The directory where data will be exported to and imported from.
The default is a |
|
The path to a secure connect bundle to connect to the origin cluster, if that cluster is a DataStax Astra DB cluster.
Options |
|
|
The host name or IP and, optionally, the port of a node from the origin cluster.
If the port is not specified, it will default to |
|
|
The password to use to authenticate against the origin cluster.
Options |
|
|
The username to use to authenticate against the origin cluster.
Options |
|
|
|
Displays this help text. |
|
|
A regular expression to select keyspaces to migrate. The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. |
|
|
A regular expression to select tables to migrate.
The default is to migrate all tables in the keyspaces that were selected for migration with |
|
The table types to migrate.
The default is |
DSBulk Migrator examples
These examples show sample username
and password
values that are for demonstration purposes only.
Don’t use these values in your environment.
Generate a migration script
Generate a migration script to migrate from an existing origin cluster to a target Astra DB cluster:
java -jar target/dsbulk-migrator-<VERSION>-embedded-driver.jar migrate-live \
--data-dir=/path/to/data/dir \
--dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \
--dsbulk-log-dir=/path/to/log/dir \
--export-host=my-origin-cluster.com \
--export-username=user1 \
--export-password=s3cr3t \
--import-bundle=/path/to/bundle \
--import-username=user1 \
--import-password=s3cr3t
Live migration with an external DSBulk Loader installation
Perform a live migration from an existing origin cluster to a target Astra DB cluster using an external DSBulk Loader installation:
java -jar target/dsbulk-migrator-<VERSION>-embedded-driver.jar migrate-live \
--data-dir=/path/to/data/dir \
--dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \
--dsbulk-log-dir=/path/to/log/dir \
--export-host=my-origin-cluster.com \
--export-username=user1 \
--export-password # password will be prompted \
--import-bundle=/path/to/bundle \
--import-username=user1 \
--import-password # password will be prompted
Passwords are prompted interactively.
Live migration with the embedded DSBulk Loader
Perform a live migration from an existing origin cluster to a target Astra DB cluster using the embedded DSBulk Loader installation:
java -jar target/dsbulk-migrator-<VERSION>-embedded-dsbulk.jar migrate-live \
--data-dir=/path/to/data/dir \
--dsbulk-use-embedded \
--dsbulk-log-dir=/path/to/log/dir \
--export-host=my-origin-cluster.com \
--export-username=user1 \
--export-password # password will be prompted \
--export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \
--export-dsbulk-option "--executor.maxPerSecond=1000" \
--import-bundle=/path/to/bundle \
--import-username=user1 \
--import-password # password will be prompted \
--import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \
--import-dsbulk-option "--executor.maxPerSecond=1000"
Passwords are prompted interactively.
The preceding example passes additional DSBulk Loader options.
The preceding example requires the dsbulk-migrator-<VERSION>-embedded-dsbulk.jar
fat jar.
Otherwise, an error is raised because no embedded DSBulk Loader can be found.
Generate DDL files to recreate the origin schema on the target cluster
Generate DDL files to recreate the origin schema on a target Astra DB cluster:
java -jar target/dsbulk-migrator-<VERSION>-embedded-driver.jar generate-ddl \
--data-dir=/path/to/data/dir \
--export-host=my-origin-cluster.com \
--export-username=user1 \
--export-password=s3cr3t \
--optimize-for-astra
Get help with DSBulk Migrator
Use the following command to display the available DSBulk Migrator commands:
java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar --help
For individual command help and each one’s options:
java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar COMMAND --help