Using CQL collections

Hive supports writing to CQL table collections.

Hive supports writing to CQL tables, including tables of collections. To store data to a CQL table from Hive, use prepared statements as shown in these examples:

Prepared statements for a list

UPDATE users SET top_places = ?  where user_id = ?
UPDATE users SET top_places = [ 'rivendell', 'rohan' ] WHERE user_id = 'frodo';

UPDATE users SET top_places = ? + top_places  where user_id = ?
UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';

UPDATE users SET top_places = top_places - ?  where user_id = ?
UPDATE users SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';

Prepared statement for a map

Prepared statements for a set are similar to those for a list.

UPDATE users SET todo = ? where user_id = ?
UPDATE users
     SET todo = { '2012-9-24' : 'enter mordor',
                  '2012-10-2 12:00' : 'throw ring into mount doom' }
     WHERE user_id = 'frodo';

The following queries are handled as a regular value instead of tuples:

UPDATE users SET top_places[2] = ?  where user_id = ?
UPDATE users SET top_places[2] = 'riddermark' WHERE user_id = 'frodo';

UPDATE users SET todo[?] = ?  where user_id = ?
UPDATE users SET todo['2012-10-2 12:10'] = 'die' WHERE user_id = 'frodo';

Example: Alter a set collection

Items in a CQL collection are mapped to the Hive types shown in the Hive to Cassandra type mapping table. The CQL data types not supported in Hive, such as blob, can be used if you transform the fields of that type using a DataStax-provided UDF.

In cqlsh, you create two tables that contain a collection sets and insert data into the tables. In Hive, you create a custom external table that maps to the first CQL table, and then insert data from the second CQL table to the first CQL table. Finally, in cqlsh, you query the second CQL table to verify that the insertion was made.

  1. In cqlsh, create the users table shown in the CQL documentation that contains a set collection column, and insert data into the table:
    cqlsh> CREATE TABLE cql3ks.users (
             user_id text PRIMARY KEY,
             first_name text,
             last_name text,
             emails set text
           );
    cqlsh> INSERT INTO cql3ks.users (user_id, first_name, last_name, emails)
             VALUES('frodo', 'Frodo', 'Baggins',
             {'f@baggins.com', 'baggins@gmail.com'});
  2. Create a second table that contains data about actors:
    cqlsh> CREATE TABLE cql3ks.actors (
             user_id text PRIMARY KEY,
             first_name text,
             last_name text,
             emails settext
           );
    
    cqlsh> INSERT INTO cql3ks.actors (user_id, first_name, last_name, emails)
           VALUES ('ejwood', 'Elijah', 'Wood', {'ejwood@hobbit.com'});
  3. In Hive, create a custom external table named hiveUserTable that maps to the CQL users table. The last couple of lines in the following statement need to be free of line breaks.
    hive> CREATE EXTERNAL TABLE hiveUserTable (emails arraystring,user_id string) STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler' TBLPROPERTIES( "cassandra.ks.name" = "cql3ks", "cassandra.cf.name" = "users", "cql3.partition.key"="user_id", "cql3.output.query" = "update cql3ks.users set emails = emails + ? WHERE user_id = ?");
  4. Add the data from the CQL actors table to the users table:
    hive> INSERT INTO TABLE hiveUserTable SELECT emails,user_id FROM cql3ks.actors;

    The MapReduce job runs and alters the table.

  5. Check that the CQL table contains Elijah Wood's email address:
    cql3ks> SELECT * FROM cql3ks.users;
    
     user_id | emails                                                     | first_name | last_name
    ---------+------------------------------------------------------------+------------+-----------
      ejwood |                                        {ejwood@hobbit.com} |       null |      null
       frodo | {baggins@gmail.com, f@baggins.com, fb@friendsofmordor.org} |      Frodo |   Baggins