Using a custom UDF

You can include your own Java code in a user-defined function (UDF) and invoke it using a query.

If the Hive built-in functions do not provide the capability you need, you can include your own Java code in a user-defined function (UDF) and invoke it using a query. DataStax provides a UDF for working with unsupported data types, for example. The example in this section uses a JAR that converts text from lowercase to uppercase. After downloading the JAR from the Hadoop tutorial examples repository and setting up the UDF in Hive, you create a Hive table. You insert data into the table from a text file installed with DataStax Enterprise. The contents of the file look like this:

238^Aval_238
86^Aval_86
311^Aval_311
27^Aval_27
165^Aval_165
. . .

When you execute a SELECT statement, you invoke the UDF to convert text in the file from lowercase to uppercase: val to VAL.

Procedure

  1. Download the JAR for this example.
  2. On the command line, add the JAR to the root Hadoop directory in the Cassandra File System (CFS) using Hadoop shell commands. For example:
    dse hadoop fs -copyFromLocal local-path-to-jar/myudfs.jar /tmp
    Substitute the path to the downloaded job in your environment for local-path-to-jar.
  3. Start a Hive client, and at the Hive prompt, add the JAR file to the Hadoop distributed cache, which copies files to task nodes to use when the files run:
    hive> add jar cfs:///tmp/myudfs.jar;
    The output on the Mac OS X is:
    converting to local cfs:///tmp/myudfs.jar
    Added /private/tmp/johndoe/hive_resources/myudfs.jar to class path
    Added resource: /private/tmp/johndoe/hive_resources/myudfs.jar
  4. At the Hive prompt, create an alias for the UDF associated with the JAR.
    hive> CREATE TEMPORARY FUNCTION myUpper AS 'org.hue.udf.MyUpper';
  5. Create a Hive table for text data.
    hive> CREATE TABLE udftest (foo INT, bar STRING);
  6. Insert data into the table, substituting the path to the DataStax Enterprise installation in your environment for the install_location. For example, on Mac OS X:
    hive> LOAD DATA LOCAL INPATH
            'install_location/resources/hive/examples/files/kv1.txt'
            OVERWRITE INTO TABLE udftest;
  7. Convert the lowercase text in the table, the instances of val, to uppercase by invoking the UDF by its alias in the SELECT statement.
    hive> SELECT myUpper(bar) from udftest;
    The mapper output looks like this:
    . . .
    MapReduce Jobs Launched:
    Job 0: Map: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
    Total MapReduce CPU Time Spent: 0 msec
    OK
    VAL_238-gg
    VAL_86-gg
    VAL_311-gg
    . . .