Running the Sqoop demo

Run SQL commands to put the data from a CSV file into a MySQL table in a MySQL database. Then import the SQL data from MySQL to a CQL table in Cassandra.

The Sqoop demo migrates the data from a MySQL table to text files in the CassandraFS. The Sqoop data migration demo uses the MySQL database and data from the North American Numbering Plan. This data consists of the area-code (NPA) and telephone number (Nxx) for the USA and Canada.

Prerequisites

To run the demo, you need:

  • Latest version of Oracle Java SE Development Kit (JDK) 7. The JRE alone will not work.
  • An installation of MySQL
  • Sufficient MySQL database privileges to create database objects
  • A JDBC driver for MySQL in the sqoop/lib directory
  • The connection string that is appropriate for the JDBC driver
  • One or more DSE nodes running the Analytics workload to run the Hadoop job that actually imports data from the external data source
  • A PATH environment variable that includes the bin directory of the DSE installation
  • To run this demo and import data to nodes in a cluster, the database permissions must be granted to the nodes. For example, use the GRANT ALL command to grant MySQL access to the hosts.

Procedure

To run the Sqoop demo on a single node on a Mac, for example, follow these steps:

  1. Install MySQL and download the JDBC driver for MySQL from the MySQL site. This example uses mysql-connector-java-5.1.29-bin.jar.
  2. Copy the JDBC driver for MySQL to the sqoop/lib directory.
    • Packaged installs: install_location/usr/share/dse/sqoop/lib
    • Tarball installs: /resources/sqoop/lib
  3. On the operating system command line, start the MySQL daemon. For example:
    $ sudo ./mysqld_safe --user=mysql
  4. Start MySQL and create the demo database:
    mysql> CREATE DATABASE npa_nxx_demo ;
  5. Then connect to the database and create the table:
    mysql> USE npa_nxx_demo;
    
    mysql> CREATE TABLE npa_nxx (
             npa_nxx_key varchar(16) NOT NULL,
             npa         varchar(3)  DEFAULT NULL,
             nxx         varchar(3)  DEFAULT NULL,
             lat         varchar(8)  DEFAULT NULL,
             lon         varchar(8)  DEFAULT NULL,
             linetype    varchar(1)  DEFAULT NULL,
             state       varchar(2)  DEFAULT NULL,
             city        varchar(36) DEFAULT NULL,
             PRIMARY KEY (npa_nxx_key)
           ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
  6. Locate the demos/sqoop directory:
    • Packaged installs: /usr/share/dse-demos/sqoop
    • Tarball installs: install_location/demos/sqoop
  7. Populate the table by loading the CSV file in the demos/sqoop directory.
    mysql> LOAD DATA LOCAL INFILE 'npa_nxx.csv'
             INTO TABLE npa_nxx_demo.npa_nxx
             FIELDS TERMINATED BY ','
             ENCLOSED BY '"'
             LINES TERMINATED BY '\n';

    MySQL returns the following message:

    Query OK, 105291 rows affected (1.01 sec) Records: 105291 Deleted: 0 Skipped: 0 Warnings: 0
  8. Start DSE as an analytics node.
    • Packaged installs:

      Edit /etc/default/dse, set HADOOP_ENABLED=1 in the cassandra.yaml to start the DSE service.

    • Tarball installs:

      From the installation directory of the DSE installation directory, use the -t option to start an analytics/Hadoop node:

      bin/dse cassandra -t
  9. Use the dse command in the bin directory to migrate the data from the MySQL table to text files in the directory npa_nxx in the CassandraFS.

    Use the database username and password or -P instead of --password to be prompted for the database password. If the database account is not password-protected, just omit the password option.

    $ bin/dse sqoop import --connect
        jdbc:mysql://127.0.0.1/npa_nxx_demo
        --username mysql
        --password <password>
        --table npa_nxx
        --target-dir /npa_nxx
    DSE returns this message: INFO mapreduce.ImportJobBase: Retrieved 105291 records.