Creating or altering CQL data from Hive

Use a Hive external table to create or alter CQL data from Hive. A counterpart to the Hive database/external table must pre-exist in Cassandra as an keyspace/table.

You need to use a Hive external table to create or alter CQL data from Hive. A counterpart to the Hive database/external table must pre-exist in Cassandra as an keyspace/table. When you use a Hive database name that matches a Cassandra keyspace name, DataStax Enterprise automatically generates a Hive external table for each table in the keyspace. If the auto-created external table does not suit your needs, you create a custom external table using different TBL and SERDEPROPERTIES. Use the CREATE EXTERNAL TABLE statement to create such a table.

To use Hive with legacy tables, such as those created using Thrift or the CLI, see DataStax Enterprise 3.0 documentation. Thrift applications require that you configure Cassandra for connection to your application using the rpc connections instead of the default native_transport connection.

Creating a custom external table 

This example assumes you created the cassandra_keyspace and exampletable in "Browsing through Cassandra tables in Hive". A Hive example table is auto-created when you run the USE cassandra_keyspace command on the Hive command line. To use a Hive database or table of a different name than the auto-created ones, but with the same or a similar schema, customize the auto-created external table as shown in this example. The example uses the Hive database named bigdata instead cassandra_keyspace, and the example uses a table named MyHiveTable instead of exampletable. The example specifies the CQL keyspace and table names in the external table definition using the TBLPROPERTIES clause to use the CQL-defined schema.

Creating an custom external table

hive> CREATE DATABASE bigdata;
hive> USE bigdata;
hive> CREATE EXTERNAL TABLE MyHiveTable 
        ( key int,  data string ) 
        STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler' 
        TBLPROPERTIES ( "cassandra.ks.name" = "cassandra_keyspace" , 
          "cassandra.cf.name" = "exampletable" );

Inspecting an auto-created, external table (DataStax Enterprise 4.0.4 and later) 

In Hive, you can use the SHOW CREATE TABLE CQL_table name command to see the schema of a auto-created external table. The output of this command can help you construct a custom Hive external table definition. Assuming you created the table in "Browsing through Cassandra tables in Hive", use the SHOW CREATE TABLE command to see the schema of exampletable.
hive> SHOW CREATE TABLE exampletable;
OK
CREATE EXTERNAL TABLE exampletable(
  key int COMMENT 'from deserializer', 
  data string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.cassandra.cql3.serde.CqlColumnSerDe' 
STORED BY 
  'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler' 
WITH SERDEPROPERTIES ( 
  'serialization.format'='1', 
  'cassandra.columns.mapping'='key,data')
LOCATION
  'cfs://127.0.0.1/user/hive/warehouse/cassandra_keyspace.db/exampletable'
TBLPROPERTIES (
  'cassandra.partitioner'='org.apache.cassandra.dht.Murmur3Partitioner', 
  'cql3.partition.key'='key', 
  'cassandra.ks.name'='cassandra_keyspace', 
  'cassandra.cf.name'='exampletable', 
  'auto_created'='true')
Time taken: 0.028 seconds, Fetched: 18 row(s)

Updating metadata in Hive when altering tables  

When you run ALTER TABLE, the metadata in Hive is not updated and subsequent Hive and SparkSQL queries fail.

Workaround

  1. Enter the hive shell:
    $ dse hive
  2. In the hive shell, drop the table:
    hive> DROP TABLE your_keyspace.your_table;
  3. To allow Hive to refresh the metadata:
    hive> USE your_keyspace;