About Sqoop

DSE Hadoop supports Sqoop for migrating data and supports password authentication for Sqoop operations.

DSE Hadoop supports Sqoop, an Apache Software Foundation tool for transferring data between an RDBMS data source and Hadoop or between other data sources, such as NoSQL. DataStax Enterprise supports the following operations:
  • Import and export data to and from CQL tables and any JDBC-compliant data source.
  • Import SQL files into a CQL collection set, list, and map.
  • Import data into CQL using a re-useable, file-based import command.
  • Import legacy data using the thrift-import tool that supports backward compatibility with previous DataStax Enterprise versions.
  • Use conventional Sqoop commands to import data into the Cassandra File System (CFS), the counterpart to HDFS, instead of a CQL table.

You can import and export MySQL, PostgreSQL, and Oracle data types that are listed in the Sqoop reference. An analytics node runs the MapReduce job that imports and exports data from a data source using Sqoop. You need a JDBC driver for the RDBMS or other type of data source.

Importing data 

You can import data from any JDBC-compliant data source. For example:

  • DB2
  • MySQL
  • Oracle
  • SQL Server
  • Sybase
Note: For databases that are not directly supported by Sqoop, use the generic JDBC driver. In some cases, using Sqoop to import data from some unsupported database types requires DataStax Enterprise 4.7.4 or later.

Securing Sqoop 

DataStax Enterprise supports password authentication for Sqoop operations. Configure password authentication using Cassandra-specific properties. Kerberos is also supported. Client-to-node encryption (SSL) is supported for Sqoop-imported and exported data.