Running the Sqoop demo

Run SQL commands to put the data from a CSV file into a MySQL table in a MySQL database. Then import the SQL data from MySQL to a CQL table in Cassandra.

The Sqoop demo migrates the data from a MySQL table to text files in the CassandraFS and then to Cassandra CQL 2 tables. The Sqoop data migration demo uses the MySQL database and data from the North American Numbering Plan. This data consists of the area-code (NPA) and telephone number (Nxx) for the USA and Canada.

Prerequisites

To run the demo, you need:
  • An installation of JDK 1.6.x or later. The JRE alone will not work.
  • An installation of MySQL
  • Sufficient MySQL database privileges to create database objects
  • A JDBC driver for MySQL in the sqoop/lib directory
  • The connection string that is appropriate for the JDBC driver
  • One or more DSE nodes running the Analytics workload to run the Hadoop job that actually imports data from the external data source
  • A PATH environment variable that includes the bin directory of the DSE installation
  • To import data to nodes in a cluster, database permissions must be granted to the nodes. For example, use the GRANT ALL command to grant MySQL access to the hosts.

Procedure

To run the Sqoop demo on a single node on a Mac, for example, follow these steps.

  1. Install MySQL and download the JDBC driver for MySQL from the MySQL site. This example uses mysql-connector-java-5.1.29-bin.jar.
  2. Copy the JDBC driver for MySQL to the sqoop/lib directory.
    • Packaged installs: /etc/dse/sqoop/lib
    • Tarball installs: install_location/resources/sqoop/lib
  3. On the operating system command line, start the MySQL daemon. For example:
    $ sudo ./mysqld_safe --user=mysql
  4. Create the demo database:
    mysql> CREATE DATABASE npa_nxx_demo;
  5. Then connect to the database and create the table:
    mysql> USE npa_nxx_demo; 
    
    mysql> CREATE TABLE npa_nxx ( 
          npa_nxx_key varchar (16) NOT NULL, 
          npa         varchar (3)  DEFAULT NULL, 
          nxx         varchar (3)  DEFAULT NULL, 
          lat         varchar (8)  DEFAULT NULL, 
          lon         varchar (8)  DEFAULT NULL, 
          line type   varchar (1)  DEFAULT NULL, 
          state       varchar (2)  DEFAULT NULL, 
          city        varchar (36) DEFAULT NULL, 
          PRIMARY KEY (npa_nxx_key) 
        ) ENGINE = InnoDB DEFAULT CHARSET = latin1;
  6. Locate the demos/sqoop directory:
    • Packaged installs: /usr/share/dse-demos/sqoop
    • Tarball installs: install_location/demos/sqoop
  7. Populate the table by loading the CSV file in the demos/sqoop directory.
    mysql> LOAD DATA LOCAL INFILE 'npa_nxx.csv' 
             INTO TABLE npa_nxx_demo . npa_nxx 
             FIELDS TERMINATED BY ',' 
             ENCLOSED BY '"' 
             LINES TERMINATED BY '\n' ;

    MySQL returns the following message:

    Query OK, 105291 rows affected (1.01 sec) Records: 105291 Deleted: 0 Skipped: 0 Warnings: 0
  8. Start DSE as an analytics node.
    • RHEL or Debian installations

      Edit /etc/default/dse, set HADOOP_ENABLED=1, and start the dse service.

    • Tar distribution, such as Mac

      From the installation directory of the DSE installation directory, use the -t option to start dse. The -t option starts Hadoop and marks the node for Analytics:

      $ bin/dse cassandra -t
  9. Use the dse command in the bin directory to migrate the data from the MySQL table to text files in the directory npa_nxx in the CassandraFS.

    Use the database username and password or -P instead of --password to be prompted for the database password. If the database account is not password-protected, just omit the password option.

    $ bin/dse sqoop import --connect
        jdbc:mysql://127.0.0.1/npa_nxx_demo
        --username mysql
        --password password
        --table npa_nxx
        --target-dir /npa_nxx
    DSE returns this message:
    INFO mapreduce.ImportJobBase: Retrieved 105291 records.
  10. Delete the npa_nxx.java file in $DSE_HOME/bin.
  11. Use the dse command again, but this time to migrate the data from the CassandraFS to a Cassandra CQL 2 table, as follows:
    $ sudo ./dse sqoop import --connect jdbc:mysql://127.0.0.1/npa_nxx_demo \ 
      --username root \ 
      --table npa_nxx \ 
      --cassandra-keyspace newKS \ 
      --cassandra-column-family npa_nxx_cf \ 
      --cassandra-row-key npa_nxx_key \ 
      --cassandra-thrift-host 127.0.0.1 \ 
      --cassandra-create-schema