Importing SQL to a CQL table or CFS

Steps for importing SQL data into a CQL table or the Cassandra File System.

To import data to CQL, the keyspace and CQL table must exist prior to the importation. If the CQL table contains data prior to the importation, cql-import updates the data. The Sqoop demo shows how to import SQL data into a CQL table.

In addition to importing data to a CQL table, you can also import data to the Cassandra File System (CFS). The CFS is the Cassandra counterpart to the Hadoop Distributed File System (HDFS). The example in this section shows how to import SQL data to CFS. Using Hive and other utilities, you can access the CFS data.

Procedure

  1. Follow the steps in the Sqoop demo to create the SQL database and table and the CQL keyspace and table.
  2. Use the dse sqoop import command to migrate the data from the MySQL table to text files in the directory npa_nxx in the CFS. Use the database username and password. If the database account is not password-protected, just omit the password option. On Linux, for example:
    $ bin/dse sqoop import --connect
        jdbc:mysql://127.0.0.1/npa_nxx_demo
        --username mysql
        --password <password>
        --table npa_nxx
        --target-dir /npa_nxx
    DataStax Enterprise returns this message:
    INFO mapreduce.ImportJobBase: Retrieved 105291 records.
  3. Use the command to view the results in the CFS. On Linux, for example:
    $ bin/dse hadoop fs -ls /npa_nxx
    Depending on the number of DataStax Enterprise analytics nodes and task tracker configuration, the output shows a number of files in the directory, part-m-0000n, where n ranges from 0 to the number of tasks that were executed as part of the Hadoop job.

    To view the contents of these files, use this hadoop fs command:

    $ bin/dse hadoop fs -cat /npa_nxx/part-m-00000

    By varying the number of tasks (the 00000), the output looks something like this:

    361991,361,991,27.73,097.40,L,TX,Corpus Christi
    361992,361,992,27.73,097.40,L,TX,Corpus Christi
    361993,361,993,27.73,097.40,L,TX,Corpus Christi
    361994,361,994,27.73,097.40,L,TX,Corpus Christi
    361998,361,998,27.79,097.90,L,TX,Agua Dulce
    361999,361,999,27.80,097.40,W,TX,Padre Island National Seashore

    As shown in the output, the CSV file format that Sqoop requires does not include optional spaces in the delimiter.