Example steps to create a CQL table, and then create an external table in Hive that
maps to the CQL table.
This example first creates a CQL table, and then creates an external table in Hive
that maps to the CQL table. The Hive table uses a SERDE property and declares a
single key followed by the number of column declarations that match the number of
columns in the CQL table. Finally, the example queries the CQL table from Hive.
Procedure
-
In cqlsh, add a table to the cql3ks keyspace
created earlier. Create a table that uses a composite partition key.
cql3ks> CREATE TABLE event_table (
key ascii, factor float, event_type text, event_date timestamp,
event_id timeuuid, num_responses varint,
PRIMARY KEY ((key, event_id), num_responses)
);
-
Insert data into the table.
cql3ks> INSERT INTO event_table (
key, factor, event_type, event_date, event_id, num_responses)
VALUES ('KeyOne', 3.14, 'Q3-launch', '2014-09-03',
f078d660-1961-11e3-8ffd-0800200c9a66, 1000000
);
-
Create the external table in Hive.
- DataStax Enterprise 4.0.4 and later
hive> CREATE EXTERNAL TABLE mapped_table(
key string, factor float, event_type string,
event_date timestamp, event_id binary, num_responses binary)
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
WITH SERDEPROPERTIES( "cassandra.ks.name" = "cql3ks",
"cassandra.cf.name" = "event_table",
"cql3.partition.key" = "key,event_id",
"cassandra.cql3.type" = "ascii, float, text, timestamp, timeuuid, varint"
);
- DataStax Enterprise 4.0 - 4.0.1
hive> CREATE EXTERNAL TABLE mapped_table(
key string, factor float, event_type string,
event_date timestamp, event_id binary, num_responses binary)
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
WITH SERDEPROPERTIES( "cassandra.ks.name" = "cql3ks",
"cassandra.cf.name" = "event_table",
"cassandra.cql3.type" = "ascii, float, text, timestamp, timeuuid, varint"
);
-
Trigger a MapReduce job to query the table in Hive.
hive> SELECT COUNT(*) FROM mapped_table;
The output is:
Total MapReduce jobs = 1
Launching Job 1 out of 1
. . .
MapReduce Jobs Launched:
Job 0: Map: 2 Reduce: 1 HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1
Time taken: 39.929 seconds