Run SQL commands to put the data from a CSV file into a MySQL table in a MySQL
database. Then import the SQL data from MySQL to a CQL table in Cassandra.
The Sqoop demo migrates the data from a MySQL table to text files in the CassandraFS
and then to Cassandra CQL 2 tables. The Sqoop data migration demo uses the MySQL
database and data from the North American Numbering Plan. This data consists of the
area-code (NPA) and telephone number (Nxx) for the USA and Canada.
Prerequisites
To run the demo, you need:
- An installation of JDK 1.6.x or later. The JRE alone will not work.
- An installation of MySQL
- Sufficient MySQL database privileges to create database objects
- A JDBC driver for MySQL in the sqoop/lib directory
- The connection string that is appropriate for the JDBC driver
- One or more DSE nodes running the Analytics workload to run the Hadoop job
that actually imports data from the external data source
- A PATH environment variable that includes the bin directory of the DSE
installation
- To import data to nodes in a cluster, database permissions must be granted
to the nodes. For example, use the GRANT ALL command to grant MySQL access
to the hosts.
Procedure
To run the Sqoop demo on a single node on a Mac, for example, follow these
steps.
-
Install MySQL and download the JDBC driver for MySQL from the MySQL site. This example uses
mysql-connector-java-5.1.29-bin.jar.
-
Copy the JDBC driver for MySQL to the sqoop/lib
directory.
- Packaged installs:
/etc/dse/sqoop/lib
- Tarball installs: install_location/resources/sqoop/lib
-
On the operating system command line, start the MySQL daemon. For
example:
$ sudo ./mysqld_safe --user=mysql
-
Create the demo database:
mysql> CREATE DATABASE npa_nxx_demo;
-
Then connect to the database and create the table:
mysql> USE npa_nxx_demo;
mysql> CREATE TABLE npa_nxx (
npa_nxx_key varchar (16) NOT NULL,
npa varchar (3) DEFAULT NULL,
nxx varchar (3) DEFAULT NULL,
lat varchar (8) DEFAULT NULL,
lon varchar (8) DEFAULT NULL,
line type varchar (1) DEFAULT NULL,
state varchar (2) DEFAULT NULL,
city varchar (36) DEFAULT NULL,
PRIMARY KEY (npa_nxx_key)
) ENGINE = InnoDB DEFAULT CHARSET = latin1;
-
Locate the demos/sqoop directory:
- Packaged installs: /usr/share/dse-demos/sqoop
- Tarball installs: install_location/demos/sqoop
-
Populate the table by loading the CSV file in the
demos/sqoop directory.
mysql> LOAD DATA LOCAL INFILE 'npa_nxx.csv'
INTO TABLE npa_nxx_demo . npa_nxx
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n' ;
MySQL returns the following message:
Query OK, 105291 rows
affected (1.01 sec) Records: 105291 Deleted: 0 Skipped: 0 Warnings:
0
-
Start DSE as an analytics node.
-
RHEL or Debian installations
Edit /etc/default/dse, set
HADOOP_ENABLED=1, and start the dse service.
-
Tar distribution, such as Mac
From the installation directory of the DSE installation directory,
use the -t option to start dse. The
-t option starts Hadoop and marks the node for
Analytics:
$ bin/dse cassandra -t
-
Use the dse command in the bin
directory to migrate the data from the MySQL table to text files in the
directory npa_nxx in the CassandraFS.
Use the database username and password or -P instead of
--password to be prompted for the database password. If
the database account is not password-protected, just omit the password
option.
$ bin/dse sqoop import --connect
jdbc:mysql://127.0.0.1/npa_nxx_demo
--username mysql
--password password
--table npa_nxx
--target-dir /npa_nxx
DSE returns this message:
INFO mapreduce.ImportJobBase: Retrieved 105291 records.
-
Delete the npa_nxx.java file in
$DSE_HOME/bin.
-
Use the dse command again, but this time to migrate the data from the
CassandraFS to a Cassandra CQL 2 table, as follows:
$ sudo ./dse sqoop import --connect jdbc:mysql://127.0.0.1/npa_nxx_demo \
--username root \
--table npa_nxx \
--cassandra-keyspace newKS \
--cassandra-column-family npa_nxx_cf \
--cassandra-row-key npa_nxx_key \
--cassandra-thrift-host 127.0.0.1 \
--cassandra-create-schema