Example: Use a CQL composite partition key

Example steps to create a CQL table, and then create an external table in Hive that maps to the CQL table.

This example first creates a CQL table, and then creates an external table in Hive that maps to the CQL table. The Hive table uses a SERDE property and declares a single key followed by the number of column declarations that match the number of columns in the CQL table. Finally, the example queries the CQL table from Hive.

Procedure

  1. In cqlsh, add a table to the cql3ks keyspace created earlier. Create a table that uses a composite partition key.
    cql3ks> CREATE TABLE event_table (
              key ascii, factor float, event_type text, event_date timestamp,
              event_id timeuuid, num_responses varint,
              PRIMARY KEY ((key, event_id), num_responses)
            );
  2. Insert data into the table.
    cql3ks> INSERT INTO event_table (
              key, factor, event_type, event_date, event_id, num_responses)
              VALUES ('KeyOne', 3.14, 'Q3-launch', '2014-09-03',
              f078d660-1961-11e3-8ffd-0800200c9a66, 1000000
            );
  3. Create the external table in Hive.
    • DataStax Enterprise 4.0.4 and later
    hive> CREATE EXTERNAL TABLE mapped_table(
            key string, factor float, event_type string,
            event_date timestamp, event_id binary, num_responses binary)
            STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
            WITH SERDEPROPERTIES( "cassandra.ks.name" = "cql3ks",
            "cassandra.cf.name" = "event_table", 
            "cql3.partition.key" = "key,event_id",
            "cassandra.cql3.type" = "ascii, float, text, timestamp, timeuuid, varint"
          );
    • DataStax Enterprise 4.0 - 4.0.1
    hive> CREATE EXTERNAL TABLE mapped_table(
            key string, factor float, event_type string,
            event_date timestamp, event_id binary, num_responses binary)
            STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
            WITH SERDEPROPERTIES( "cassandra.ks.name" = "cql3ks",
            "cassandra.cf.name" = "event_table",
            "cassandra.cql3.type" = "ascii, float, text, timestamp, timeuuid, varint"
          );
  4. Trigger a MapReduce job to query the table in Hive.
    hive> SELECT COUNT(*) FROM mapped_table;
    

    The output is:

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    . . .
    MapReduce Jobs Launched:
    Job 0: Map: 2  Reduce: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
    Total MapReduce CPU Time Spent: 0 msec
    OK
    1
    Time taken: 39.929 seconds