Example: Work with an unsupported data type (deprecated)
DataStax Enterprise provides a user defined function (UDF) for converting Hive binary data into string representations of CQL types.
Hadoop is deprecated for use with DataStax Enterprise. DSE Hadoop and BYOH (Bring Your Own Hadoop) are deprecated. Hive is also deprecated and will be removed when Hadoop is removed.
DataStax Enterprise provides a user defined function (UDF) for converting Hive binary data into string representations of CQL types. Hive cannot auto-create an external table that maps to the unsupported types, such as Cassandra blobs. You have to create a custom external table in Hive and map these types to binary. To read the data in Hive, use a provided UDF to convert the data.
Create the keyspace and two tables in cqlsh
This example first creates a keyspace and two tables in cqlsh and inserts data of every supported type into the tables.
- Start cqlsh. For example, on
Linux.
./cqlsh
- In cqlsh, create a
keyspace:
cqlsh> CREATE KEYSPACE cql3ks WITH replication = { 'class': 'NetworkTopologyStrategy', 'Analytics': '1' };
- Using cqlsh, create a table in the cql3ks keyspace having columns of every CQL data
type.
cqlsh> USE cql3ks; cql3ks> CREATE TABLE genericAdd ( key ascii PRIMARY KEY, a bigint, b blob, c boolean, d decimal, e double, f float, g inet, h int, i text, j timestamp, k uuid, l timeuuid, m varint);
- Insert some data into the
table.
cql3ks> INSERT INTO genericAdd ( key,a,b,c,d,e,f,g,h,i,j,k,l,m) VALUES ('KeyOne', 100005, 0xBEEFFEED, true, 3.5,-1231.4, 3.14, '128.2.4.1', 42, 'SomeText', '2008-10-03', e3d81c40-1961-11e3-8ffd-0800200c9a66, f078d660-1961-11e3-8ffd-0800200c9a66, 1000000);
- Create a second table, genericToAdd, containing every data type and insert different data
into the
table.
cql3ks> CREATE TABLE genericToAdd ( id int PRIMARY KEY, key ascii, a bigint, b blob, c boolean, d decimal, e double, f float, g inet, h int, i text, j timestamp, k uuid, l timeuuid, m varint);
- Insert some data into the second
table.
cql3ks> INSERT INTO genericToAdd ( id,key,a,b,c,d,e,f,g,h,i,j,k,l,m) VALUES (1,'Oneness',1, 0x11111111, true, 1.11,-1111.1,1.11, '111.1.1.1', 11,'11111','1999-11-01', e3d81c40-1961-11e3-8ffd-0800200c9a66, f078d660-1961-11e3-8ffd-0800200c9a66, 1);
Create an external table in Hive
- Create a table in Hive that includes a cql3.output.query property that has the value of a
prepared statement for inserting the data from the second, genericToAdd, table into the first,
genericAdd, table.
The last couple of lines in the following statement need to be free of line breaks. If you copy/paste this statement directly from the documentation and do not remove line breaks, an error occurs in the subsequent step.
hive> CREATE EXTERNAL TABLE hive_genericadd ( key string, a bigint, b binary, c boolean, d decimal, e double, f float, g binary, h int, i string, j timestamp, k binary, l binary, m binary) STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler' TBLPROPERTIES ( "cassandra.ks.name" = "cql3ks", "cassandra.cf.name" = "genericadd", "cql3.partition.key"="key", "cql3.output.query" = "INSERT INTO cql3ks.genericadd (key,a,b,c,d,e,f,g,h,i,j,k,l,m) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)");
- Use the
INSERT
statement to start the MapReduce job that inserts the data from the second CQL table into the first one.hive> INSERT INTO TABLE hive_genericadd SELECT key,a,b,c,d,e,f,g,h,i,j,k,l,m FROM cql3ks.generictoadd;
The MapReduce job runs.
Total MapReduce jobs = 1 Launching Job 1 out of 1 . . . Job 0: Map: 2 HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 33.278 seconds
- Create an alias for the UDF provided by
DataStax.
hive> CREATE TEMPORARY FUNCTION c_to_string AS 'org.apache.hadoop.hive.cassandra.ql.udf.UDFCassandraBinaryToString';
- Select the data of the unsupported blob type from the Hive table by calling the
UDF.
hive> select c_to_string(b, 'blob') from hive_genericadd;
The MapReduce job runs, and the output correctly displays the values:
Total MapReduce jobs = 1
. . .
Job 0: Map: 2 HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
beeffeed
11111111