Creating a Hive CQL output query

One of the Hive external table properties (TBLPROPERTIES) is the cql3.output.query. The value of this property is a prepared statement that the MapReduce job uses to insert data into the corresponding Cassandra table.

One of the Hive external table properties (TBLPROPERTIES) is the cql3.output.query. The value of this property is a prepared statement that the MapReduce job uses to insert data into the corresponding Cassandra table. The prepared query is identical to the CQL statement for altering the table except the binding of the ? is done by Hive. The ? are bound to the hive columns in the order specified in the external table schema.

You can set TTL (time to live) on data in a column using the cql3.output.query property.

In the example of using a collection set, the external table definition determines the bind variables, '?'s, needed in the prepared statements:

hive> CREATE EXTERNAL TABLE hiveUserTable
   (emails arraystring,user_id string)
   . . .

This external table schema specifes the second column to be the user_id; therefore, this INSERT statement takes the columns emails, user_id from the Cassandra actors table and maps the data into the Hive emails and user_id columns:

hive> INSERT INTO TABLE hiveUserTable SELECT emails,user_id FROM actors;

The following diagram shows the relationship between the tables and the bind variables:



The hiveUserTable includes this prepared query:
"cql3.output.query" =
  "update cql3ks.users set emails = emails + ? WHERE user_id = ?");

The Hive INSERT statement starts the MapReduce job that uses the key value from the actors table in the 'WHERE (user_id) =' clause of prepared statement.

Another example, an abstract one, updates a table having three columns (x,y,z) using a prepared statement. The query looks like this internally:

create external table ( x,y,z ) Stored by  ....
(cql3.output.query = "Update cassX = ?(x) cassY=?(y) where cassZ= ?(z))"

Setting TTL on column data 

You can set the TTL on data in an external table. Decoded the following example of how to set TTL using the cql3.output.query looks like this:

UPDATE users USING TTL 432000 SET 'password' = 'ch@ngem3a' WHERE KEY = 'jsmith';

To set TTL on data in an auto-created table, configure a property named cql.output.query.ttl for the CQL table. Set the property as you would set the comment property. This action sets the TTL for the entire record.