Example steps to create a CQL table, and then create an external table in Hive that
maps to the CQL table.
This example first creates a CQL table, and then creates an external table in Hive
that maps to the CQL table. You cannot use the auto-created external table because
Hive does not support the timeuuid or varint types used in the CQL table. You need
to declare these types binary in the external table definition. The Hive table uses
a SERDE property and declares a single key followed by the column declarations that
correspond to columns in the CQL table. Finally, the example queries the CQL table
from Hive.
Procedure
-
In cqlsh, add a table to the cql3ks keyspace
created earlier. Create a table that uses a composite partition key.
cql3ks> CREATE TABLE event_table (
key ascii, factor float, event_type text, event_date timestamp,
event_id timeuuid, num_responses varint,
PRIMARY KEY ((key, event_id), num_responses)
);
-
Insert data into the table.
cql3ks> INSERT INTO event_table (
key, factor, event_type, event_date, event_id, num_responses)
VALUES ('KeyOne', 3.14, 'Q3-launch', '2014-09-03',
f078d660-1961-11e3-8ffd-0800200c9a66, 1000000
);
-
Create a custom external table in Hive named mapped_table that maps to the CQL
event_table.
hive> CREATE EXTERNAL TABLE mapped_table(
key string, factor float, event_type string,
event_date timestamp, event_id binary, num_responses binary)
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
WITH SERDEPROPERTIES( "cassandra.ks.name" = "cql3ks",
"cassandra.cf.name" = "event_table", 'cql3.partition.key'='key,event_id',
"cassandra.cql3.type" = "ascii, float, text, timestamp, timeuuid, varint"
);
-
Trigger a MapReduce job to query the table in Hive.
hive> SELECT COUNT(*) FROM mapped_table;
The output is:
Total MapReduce jobs = 1
Launching Job 1 out of 1
. . .
MapReduce Jobs Launched:
Job 0: Map: 2 Reduce: 1 HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1
Time taken: 39.929 seconds