DSBulk Migrator
Use DSBulk Migrator to perform simple migration of smaller data quantities, where data validation (other than post-migration row counts) is not necessary.
DSBulk Migrator prerequisites
-
Install or switch to Java 11.
-
Install Maven 3.9.x.
-
Optionally install DSBulk Loader, if you elect to reference your own external installation of DSBulk, instead of the embedded DSBulk that’s in DSBulk Migrator.
-
Install Simulacron 0.12.x and its prerequisites, for integration tests.
Building DSBulk Migrator
Building DSBulk Migrator is accomplished with Maven. First, clone the git repo to your local machine. Example:
cd ~/github
git clone git@github.com:datastax/dsbulk-migrator.git
cd dsbulk-migrator
Then run:
mvn clean package
The build produces two distributable fat jars:
-
dsbulk-migrator-<VERSION>-embedded-driver.jar
: contains an embedded Java driver; suitable for live migrations using an external DSBulk, or for script generation. This jar is NOT suitable for live migrations using an embedded DSBulk, since no DSBulk classes are present. -
dsbulk-migrator-<VERSION>-embedded-dsbulk.jar
: contains an embedded DSBulk and an embedded Java driver; suitable for all operations. Note that this jar is much bigger than the previous one, due to the presence of DSBulk classes.
Testing DSBulk Migrator
The project contains a few integration tests. Run them with:
mvn clean verify
The integration tests require Simulacron. Be sure to meet all the Simulacron prerequisites before running the tests.
Running DSBulk Migrator
Launch the DSBulk Migrator tool:
java -jar /path/to/dsbulk-migrator.jar { migrate-live | generate-script | generate-ddl } [OPTIONS]
When doing a live migration, the options are used to effectively configure DSBulk and to connect to the clusters.
When generating a migration script, most options serve as default values in the generated scripts. Note however that, even when generating scripts, this tool still needs to access the Origin cluster in order to gather metadata about the tables to migrate.
When generating a DDL file, only a few options are meaningful. Because standard DSBulk is not used, and the import cluster is never contacted, import options and DSBulk-related options are ignored. The tool still needs to access the Origin cluster in order to gather metadata about the keyspaces and tables for which to generate DDL statements.
DSBulk Migrator reference
Live migration command-line options
The following options are available for the migrate-live
command.
Most options have sensible default values and do not need to be specified, unless you want to override the default value.
|
|
The external DSBulk command to use.
Ignored if the embedded DSBulk is being used.
The default is simply |
|
|
The directory where data will be exported to and imported from.
The default is a |
|
|
Use the embedded DSBulk version instead of an external one. The default is to use an external DSBulk command. |
|
The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a DataStax Astra DB cluster.
Options |
|
|
The consistency level to use when exporting data.
The default is |
|
|
An extra DSBulk option to use when exporting.
Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process.
DSBulk options, including driver options, must be passed as |
|
|
The host name or IP and, optionally, the port of a node from the Origin cluster.
If the port is not specified, it will default to |
|
|
The maximum number of concurrent files to write to.
Must be a positive number or the special value |
|
|
The maximum number of concurrent queries to execute.
Must be a positive number or the special value |
|
|
The maximum number of records to export for each table.
Must be a positive number or |
|
|
The password to use to authenticate against the Origin cluster.
Options |
|
|
The maximum number of token range queries to generate.
Use the |
|
|
The username to use to authenticate against the Origin cluster.
Options |
|
|
|
Displays this help text. |
|
The path to a secure connect bundle to connect to the Target cluster, if it’s a DataStax Astra DB cluster.
Options |
|
|
The consistency level to use when importing data.
The default is |
|
|
The default timestamp to use when importing data.
Must be a valid instant in ISO-8601 syntax.
The default is |
|
|
An extra DSBulk option to use when importing.
Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process.
DSBulk options, including driver options, must be passed as |
|
|
The host name or IP and, optionally, the port of a node from the Target cluster.
If the port is not specified, it will default to |
|
|
The maximum number of concurrent files to read from.
Must be a positive number or the special value |
|
|
The maximum number of concurrent queries to execute.
Must be a positive number or the special value |
|
|
The maximum number of failed records to tolerate when importing data.
The default is |
|
|
The password to use to authenticate against the Target cluster.
Options |
|
|
The username to use to authenticate against the Target cluster. Options |
|
|
|
A regular expression to select keyspaces to migrate. The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. |
|
|
The directory where DSBulk should store its logs.
The default is a |
|
The maximum number of concurrent operations (exports and imports) to carry.
The default is |
|
|
Skip truncate confirmation before actually truncating tables. Only applicable when migrating counter tables, ignored otherwise. |
|
|
|
A regular expression to select tables to migrate.
The default is to migrate all tables in the keyspaces that were selected for migration with |
|
The table types to migrate.
The default is |
|
|
Truncate tables before the export instead of after. The default is to truncate after the export. Only applicable when migrating counter tables, ignored otherwise. |
|
|
|
The directory where DSBulk should be executed. Ignored if the embedded DSBulk is being used. If unspecified, it defaults to the current working directory. |
Script generation command-line options
The following options are available for the generate-script
command.
Most options have sensible default values and do not need to be specified, unless you want to override the default value.
|
|
The DSBulk command to use.
The default is simply |
|
|
The directory where data will be exported to and imported from.
The default is a |
|
The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a DataStax Astra DB cluster.
Options |
|
|
The consistency level to use when exporting data.
The default is |
|
|
An extra DSBulk option to use when exporting.
Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process.
DSBulk options, including driver options, must be passed as |
|
|
The host name or IP and, optionally, the port of a node from the Origin cluster.
If the port is not specified, it will default to |
|
|
The maximum number of concurrent files to write to.
Must be a positive number or the special value |
|
|
The maximum number of concurrent queries to execute.
Must be a positive number or the special value |
|
|
The maximum number of records to export for each table.
Must be a positive number or |
|
|
The password to use to authenticate against the Origin cluster.
Options |
|
|
The maximum number of token range queries to generate.
Use the |
|
|
The username to use to authenticate against the Origin cluster.
Options |
|
|
|
Displays this help text. |
|
The path to a secure connect bundle to connect to the Target cluster, if it’s a DataStax Astra DB cluster.
Options |
|
|
The consistency level to use when importing data.
The default is |
|
|
The default timestamp to use when importing data.
Must be a valid instant in ISO-8601 syntax.
The default is |
|
|
An extra DSBulk option to use when importing.
Any valid DSBulk option can be specified here, and it will passed as is to the DSBulk process.
DSBulk options, including driver options, must be passed as |
|
|
The host name or IP and, optionally, the port of a node from the Target cluster.
If the port is not specified, it will default to |
|
|
The maximum number of concurrent files to read from.
Must be a positive number or the special value |
|
|
The maximum number of concurrent queries to execute.
Must be a positive number or the special value |
|
|
The maximum number of failed records to tolerate when importing data.
The default is |
|
|
The password to use to authenticate against the Target cluster.
Options |
|
|
The username to use to authenticate against the Target cluster.
Options |
|
|
|
A regular expression to select keyspaces to migrate. The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. |
|
|
The directory where DSBulk should store its logs.
The default is a |
|
|
A regular expression to select tables to migrate.
The default is to migrate all tables in the keyspaces that were selected for migration with |
|
The table types to migrate. The default is |
DDL generation command-line options
The following options are available for the generate-ddl
command.
Most options have sensible default values and do not need to be specified, unless you want to override the default value.
|
|
Produce CQL scripts optimized for DataStax Astra DB. Astra DB does not allow some options in DDL statements. Using this DSBulk Migrator command option, forbidden Astra DB options will be omitted from the generated CQL files. |
|
|
The directory where data will be exported to and imported from.
The default is a |
|
The path to a secure connect bundle to connect to the Origin cluster, if that cluster is a DataStax Astra DB cluster.
Options |
|
|
The host name or IP and, optionally, the port of a node from the Origin cluster.
If the port is not specified, it will default to |
|
|
The password to use to authenticate against the Origin cluster.
Options |
|
|
The username to use to authenticate against the Origin cluster.
Options |
|
|
|
Displays this help text. |
|
|
A regular expression to select keyspaces to migrate. The default is to migrate all keyspaces except system keyspaces, DSE-specific keyspaces, and the OpsCenter keyspace. Case-sensitive keyspace names must be entered in their exact case. |
|
|
A regular expression to select tables to migrate.
The default is to migrate all tables in the keyspaces that were selected for migration with |
|
The table types to migrate.
The default is |
Getting help with DSBulk Migrator
Use the following command to display the available DSBulk Migrator commands:
java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar --help
For individual command help and each one’s options:
java -jar /path/to/dsbulk-migrator-embedded-dsbulk.jar COMMAND --help
DSBulk Migrator examples
These examples show sample |
Generate migration script
Generate a migration script to migrate from an existing Origin cluster to a Target Astra DB cluster:
java -jar target/dsbulk-migrator-<VERSION>-embedded-driver.jar migrate-live \
--data-dir=/path/to/data/dir \
--dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \
--dsbulk-log-dir=/path/to/log/dir \
--export-host=my-origin-cluster.com \
--export-username=user1 \
--export-password=s3cr3t \
--import-bundle=/path/to/bundle \
--import-username=user1 \
--import-password=s3cr3t
Migrate live using external DSBulk install
Migrate live from an existing Origin cluster to a Target Astra DB cluster using an external DSBulk installation. Passwords will be prompted interactively:
java -jar target/dsbulk-migrator-<VERSION>-embedded-driver.jar migrate-live \
--data-dir=/path/to/data/dir \
--dsbulk-cmd=${DSBULK_ROOT}/bin/dsbulk \
--dsbulk-log-dir=/path/to/log/dir \
--export-host=my-origin-cluster.com \
--export-username=user1 \
--export-password # password will be prompted \
--import-bundle=/path/to/bundle \
--import-username=user1 \
--import-password # password will be prompted
Migrate live using embedded DSBulk install
Migrate live from an existing Origin cluster to a Target Astra DB cluster using the embedded DSBulk installation. Passwords will be prompted interactively. In this example, additional DSBulk options are passed.
java -jar target/dsbulk-migrator-<VERSION>-embedded-dsbulk.jar migrate-live \
--data-dir=/path/to/data/dir \
--dsbulk-use-embedded \
--dsbulk-log-dir=/path/to/log/dir \
--export-host=my-origin-cluster.com \
--export-username=user1 \
--export-password # password will be prompted \
--export-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \
--export-dsbulk-option "--executor.maxPerSecond=1000" \
--import-bundle=/path/to/bundle \
--import-username=user1 \
--import-password # password will be prompted \
--import-dsbulk-option "--connector.csv.maxCharsPerColumn=65536" \
--import-dsbulk-option "--executor.maxPerSecond=1000"
In the example above, you must use the |
Generate DDL to recreate Origin schema in Target
Generate DDL files to recreate the Origin schema in a Target Astra DB cluster:
java -jar target/dsbulk-migrator-<VERSION>-embedded-driver.jar generate-ddl \
--data-dir=/path/to/data/dir \
--export-host=my-origin-cluster.com \
--export-username=user1 \
--export-password=s3cr3t \
--optimize-for-astra