About Sqoop

DSE Hadoop supports Sqoop for migrating data and supports password authentication for Sqoop operations.

Sqoop, an Apache Software Foundation tool, can be used to transfer data between an RDBMS data source and Hadoop or between other data sources, such as NoSQL. DataStax Enterprise supports the following operations:
  • Import and export data to and from CQL tables and any JDBC-compliant data source.
  • Import SQL files into a CQL collection set, list, and map.
  • Import data into CQL using a re-useable, file-based import command.
  • Import legacy data using the thrift-import tool that supports backward compatibility with earlier DataStax Enterprise versions.
  • Use conventional Sqoop commands to import data into the Cassandra File System (CFS), the counterpart to HDFS, instead of a CQL table.

You can import and export MySQL, PostgreSQL, and Oracle data types that are supported for use with the Sqoop command. A DSE Analytics node runs the MapReduce job that imports and exports data from a data source using Sqoop. You need a JDBC driver for the RDBMS or other type of data source.

Importing data 

You can import data from any JDBC-compliant data source. For example:

  • DB2
  • MySQL
  • Oracle
  • SQL Server
  • Sybase
Note: For databases that are not directly supported by Sqoop, use the generic JDBC driver. In some cases, using Sqoop to import data from some unsupported database types requires DataStax Enterprise 4.7.4 or later.

Securing Sqoop 

DataStax Enterprise supports multiple security options and password authentication for Sqoop operations: