Creating a Hive CQL output query

One of the Hive external table properties (TBLPROPERTIES) is the cql3.output.query. The value of this property is a prepared statement that the MapReduce job uses to insert data into the corresponding Cassandra table.

One of the Hive external table properties (TBLPROPERTIES) is the cql3.output.query. The value of this property is a CQL3 prepared statement that the MapReduce job uses to insert data into the corresponding Cassandra table. The prepared statement must be url-encoded to make special characters readable by Hive. The prepared query is identical to the CQL statement for altering the table except the binding of the ? is done by Hive. The ? are bound to the hive columns in the order specified in the external table schema.

In the example of using a collection set, the external table definition determines the bind variables, '?'s, needed in the prepared statements:

hive> CREATE EXTERNAL TABLE hiveUserTable
   (emails array<string>,user_id string)
   . . .

This external table schema specifes the second column to be the user_id; therefore, this INSERT statement takes the columns emails, user_id from the Cassandra actors table and maps the data into the Hive emails and user_id columns:

hive> INSERT INTO TABLE hiveUserTable SELECT emails,user_id FROM actors;

The following diagram shows the relationship between the tables and the bind variables:



The hiveUserTable includes this prepared query:
"cql3.output.query" =
  "update%20cql3ks.users%20set%20emails%20%3D%20emails
    %20%2B%20%3F%20WHERE%20user_id%20%3D%20%3F");
Decoded, the cql3.output.query looks like this:
UPDATE cql3ks.users SET emails = emails + ? WHERE user_id = ?

The Hive INSERT statement starts the MapReduce job that uses the key value from the actors table in the 'WHERE (user_id) =' clause of prepared statement.

Another example, an abstract one, updates a table having three columns (x,y,z) using a prepared statement. The query looks like this internally:

create external table ( x,y,z ) Stored by  ....
(cql3.output.query = "Update cassX = ?(x) cassY=?(y) where cassZ= ?(z))"