You can include your own Java code in a user-defined function (UDF) and invoke it
using a query.
If the Hive built-in functions do not provide the capability you need, you can
include your own Java code in a user-defined function (UDF) and invoke it using a
query. DataStax provides a UDF for
working with unsupported data types, for example. The example in this section uses a
JAR that converts text from lowercase to uppercase. After downloading the JAR from
the Hadoop tutorial examples repository and setting up the
UDF in Hive, you create a Hive table. You insert data into the table from a text
file installed with DataStax Enterprise. The contents of the file look like
this:
238^Aval_238
86^Aval_86
311^Aval_311
27^Aval_27
165^Aval_165
. . .
When you execute a SELECT statement, you invoke the UDF to convert text in the file
from lowercase to uppercase: val to VAL.
Procedure
-
Download the JAR for this
example.
-
On the command line, add the JAR to the root Hadoop directory in the Cassandra
File System (CFS) using Hadoop shell
commands. For example:
dse hadoop fs -copyFromLocal local-path-to-jar/myudfs.jar /tmp
Substitute the path to the downloaded job in your environment for
local-path-to-jar.
-
Start a Hive client, and at the Hive
prompt, add the JAR file to the Hadoop distributed cache, which copies files to
task nodes to use when the files run:
hive> add jar cfs:///tmp/myudfs.jar;
The output on the Mac OS X
is:
converting to local cfs:///tmp/myudfs.jar
Added /private/tmp/johndoe/hive_resources/myudfs.jar to class path
Added resource: /private/tmp/johndoe/hive_resources/myudfs.jar
-
At the Hive prompt, create an alias for the UDF associated with the JAR.
hive> CREATE TEMPORARY FUNCTION myUpper AS 'org.hue.udf.MyUpper';
-
Create a Hive table for text data.
hive> CREATE TABLE udftest (foo INT, bar STRING);
-
Insert data into the table, substituting the path to the DataStax Enterprise
installation in your environment for the install_location.
For example, on Mac OS X:
hive> LOAD DATA LOCAL INPATH
'install_location/resources/hive/examples/files/kv1.txt'
OVERWRITE INTO TABLE udftest;
-
Convert the lowercase text in the table, the instances of val, to
uppercase by invoking the UDF by its alias in the SELECT statement.
hive> SELECT myUpper(bar) from udftest;
The mapper output looks like
this:
. . .
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
VAL_238-gg
VAL_86-gg
VAL_311-gg
. . .