Using Sqoop

Sqoop is an Apache Software Foundation tool for transferring data between an RDBMS data source and Hadoop or between other data sources, such as NoSQL.

Sqoop is an Apache Software Foundation tool for transferring data between an RDBMS data source and Hadoop or between other data sources, such as NoSQL.

DataStax Enterprise support for Sqoop empowers you to import data from an external data source to Hadoop, Hive, or Cassandra tables. A DSE node runs the Hadoop/Analytics workload, and the Hadoop job imports data from a data source using Sqoop.

Running the Sqoop demo 

To get started using Sqoop, first run the Sqoop demo to import data from a MySQL table to text files in the Cassandra File System (CFS).

Importing data 

You can import data from any JDBC-compliant data source. For example:

  • DB2
  • MySQL
  • Oracle
  • SQL Server
  • Sybase

You need a JDBC driver for the RDBMS or other type of data source.

Migrating data to a Cassandra table 

After importing data into text files in Cassandra, the demo shows you how to expand the basic sqoop import command, using Cassandra options to migrate data to a Cassandra CQL 2 table.

Getting information about the sqoop command 

Use the help option of the sqoop import command to get online help on Sqoop command line options. For example, on the Mac:

$ cd install_location/bin

$ ./dse sqoop import --help