Importing joined tables

A common use case example imports multiple tables, which are joined in SQL, to Cassandra.

A common use case is to import multiple tables, which are joined in SQL, to Cassandra. This example shows how to import two tables from MySQL. In MySQL, you use query joins to get famous quotations from one table and the author of the quotation from another. For example:
mysql> SELECT * FROM person INNER JOIN mysql_quotations ON person.id=mysql_quotations.speaker;

+-----+--------------------+---------+----+---------+--------------------------------------------------------------+
| id  | name               | title   | id | speaker | quote                                                        |
+-----+--------------------+---------+----+---------+--------------------------------------------------------------+
| 123 | Christopher Morley | Life    |  1 |     123 | Life is a foreign language; all men mispronounce it.         |
| 123 | Christopher Morley | Life    |  2 |     123 | There are three ingredients in the good life: learning . . . |
| 124 | Daniel Akst        | Life    |  3 |     124 | In matters of self-control as we shall see again and . . .   | 
| 124 | Daniel Akst        | Life    |  4 |     124 | We Have Met the Enemy: Self-Control in an Age of Exc. . .    |
| 125 | Abraham Lincoln    | Success |  5 |     125 | Always bear in mind that your own resolution to . . .        |
| 125 | Abraham Lincoln    | Success |  6 |     125 | Better to remain silent and be thought a fool than . . .     |
| 126 | Albert Einstein    | Success |  7 |     126 | If A is success in life, then A equals x plus y plus . . .   |
+-----+--------------------+---------+----+---------+--------------------------------------------------------------+
7 rows in set (0.00 sec)

This example assumes an analytics node is running.

Procedure

To import SQL tables into CQL using a collection set for the quotations:

  1. Download the import_quotations.zip file.
  2. Create the mysql_quotations and person tables in MySQL. You can copy/paste commands from the downloaded file to produce these tables.
  3. Create the famous_words keyspace and quotations table in cqlsh. You can copy/paste the commands from the downloaded file.
    cqlsh> CREATE KEYSPACE famous_words WITH REPLICATION = 
             {'class':'NetworkTopologyStrategy', 'Analytics':1};
    cqlsh> USE famous_words;
    cqlsh:famous_words> CREATE TABLE quotations (
                          id text PRIMARY KEY,
                          name text,
                          title text,
                          quotes set text
                        );
  4. Insert the data from the downloaded file into the person and mysql_quotations tables.
  5. Create an import options file named import_persons.options having the following contents.
    cql-import
    --table
    person
    --cassandra-keyspace
    famous_words
    --cassandra-table
    quotations
    --cassandra-column-mapping
    id:id,name:name,title:title
    --connect
    jdbc:mysql://127.0.0.1/famous_words
    --username
    root
    --password
    root
    --cassandra-host
    127.0.0.1
  6. Import the person table into the CQL table. On Linux, for example:
    $ sudo bin/dse sqoop --options-file path/import_person.options
    The MapReduce job runs and at then end, looks something like this:
    . . . 
    14/06/16 20:26:43 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 35.1743 seconds (0 bytes/sec)
    14/06/16 20:26:43 INFO mapreduce.ImportJobBase: Retrieved 4 records.
  7. Check that the CQL quotations table now contains the data from the MySQL person table.
    cqlsh:famous_words> SELECT * FROM quotations;
     id  | name               | quotes | title
    -----+--------------------+--------+---------
     123 | Christopher Morley |   null |    Life
     125 |    Abraham Lincoln |   null | Success
     126 |    Albert Einstein |   null | Success
     124 |        Daniel Akst |   null |    Life
    
    (4 rows)
  8. Create another import options file to import the mysql_quotations table. Use a free form query to import the quotations into the CQL table. The literal string $CONDITIONS needs to appear in the WHERE clause of the query. Sqoop replaces the string with its refined constraints. For example:
    cql-import
    --query
    select person.id, person.name, person.title, mysql_quotations.quote from person INNER JOIN mysql_quotations ON person.id=mysql_quotations.speaker WHERE $CONDITIONS
    --target-dir
    /sqoop
    --split-by
    person.id
    --cassandra-keyspace
    famous_words
    --cassandra-table
    quotations
    --cassandra-column-mapping
    id:id,name:name,title:title,quotes:quote
    --connect
    jdbc:mysql://127.0.0.1/famous_words
    --username
    root
    --password
    root
    --cassandra-host
    127.0.0.1
    >
  9. Import the resultset of quotations into the CQL table. On Linux, assuming you named the options file import_quotes.options:
    $ sudo bin/dse sqoop --options-file path/import_quotes.options
  10. Check that the CQL quotations table now contains a collection set of quotations as well as the name of the speaker and other information in the MySQL table.
    cqlsh:famous_words> SELECT * FROM quotations;