Example: Use a CQL composite partition key (deprecated)

Example steps to create a CQL table, and then create an external table in Hive that maps to the CQL table.

Hadoop is deprecated for use with DataStax Enterprise. DSE Hadoop and BYOH (Bring Your Own Hadoop) are deprecated. Hive is also deprecated and will be removed when Hadoop is removed.

This example first creates a CQL table, and then creates an external table in Hive that maps to the CQL table. You cannot use the auto-created external table because Hive does not support the timeuuid or varint types used in the CQL table. You need to declare these types binary in the external table definition. The Hive table uses a SERDE property and declares a single key followed by the column declarations that correspond to columns in the CQL table. Finally, the example queries the CQL table from Hive.

Procedure

In cqlsh, add a table to the cql3ks keyspace created earlier. Create a table that uses a composite partition key.

cql3ks> CREATE TABLE event_table (
          key ascii, factor float, event_type text, event_date timestamp,
          event_id timeuuid, num_responses varint,
          PRIMARY KEY ((key, event_id), num_responses)
        );

Insert data into the table.

cql3ks> INSERT INTO event_table (
          key, factor, event_type, event_date, event_id, num_responses)
          VALUES ('KeyOne', 3.14, 'Q3-launch', '2014-09-03',
          f078d660-1961-11e3-8ffd-0800200c9a66, 1000000
        );

Create a custom external table in Hive named mapped_table that maps to the CQL event_table.

hive> CREATE EXTERNAL TABLE mapped_table(
        key string, factor float, event_type string,
        event_date timestamp, event_id binary, num_responses binary)
        STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
        WITH SERDEPROPERTIES( "cassandra.ks.name" = "cql3ks",
        "cassandra.cf.name" = "event_table", 'cql3.partition.key'='key,event_id',
        "cassandra.cql3.type" = "ascii, float, text, timestamp, timeuuid, varint"
      );

Trigger a MapReduce job to query the table in Hive.

hive> SELECT COUNT(*) FROM mapped_table;

The output is:

Total MapReduce jobs = 1
Launching Job 1 out of 1
. . .
MapReduce Jobs Launched:
Job 0: Map: 2  Reduce: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1
Time taken: 39.929 seconds