Using TBLPROPERTIES and SERDEPROPERTIES

In an external table definition, the TBLPROPERTIES clause maps the Hive database to a CQL table and can include MapReduce properties, Cassandra database configuration, and native protocol properties for the table. The SERDEPROPERTIES clause specifies the properties used when serializing/deserializing data passed between the Hive table and Cassandra.

In an external table definition, the TBLPROPERTIES clause maps the Hive database to a CQL table and can include MapReduce properties, Cassandra database configuration, and native protocol properties for the table. The SERDEPROPERTIES clause specifies the properties used when serializing/deserializing data passed between the Hive table and Cassandra. You can add a WITH SERDEPROPERTIES clause to map meaningful column names in Hive to the Cassandra partition key, column names, and column values. You can change these properties on the fly. Using the Hive SET command, you can configure properties in the hive session. The settings become effective for the next query.

The following table lists general properties used in the TBLPROPERTIES or SERDEPROPERTIES clause or both. The subsequent section lists additional, optional properties for use with the DataStax Java Driver. The TBL/SERDE column of the following table lists how to declare properties in the table definition, as a TBLPROPERTIES (TBL), a SERDEPROPERTIES (SERDE) or both.

General TBL and SERDE properties
General Property TBL/SERDE Description
cassandra.cf.name both Cassandra table name
cassandra.columns.mapping both Mapping of Hive to legacy Cassandra columns
cassandra.consistency.level both Consistency level - default ONE
cassandra.cql3.type both CQL types
cassandra.host both IP of a Cassandra node to connect to
cassandra.input.split.size both MapReduce split size
cassandra.ks.name both Cassandra keyspace name
cassandra.partitioner both Partitioner (default = configured partitioner)
cassandra.port both Cassandra RPC port - default 9160
cql3.output.query TBL A prepared statement for storing alterations to a CQL users table
cql3.partition.key both CQL partition key, a comma-separated list of partition and clustering keys
cql3.pushdown.enable TBL True (default) enable pushdown predicate
cql3.update.columns both Used with INSERT INTO SELECT

Required table properties

When you create an external table in Hive, you need to specify these properties:

  • cassandra.ks.name
  • cassandra.cf.name

Other frequently-used properties are:

  • cql3.output.query
  • cql3.partition.key (DataStax Enterprise 4.0.4 and later)
You use the SHOW CREATE TABLE CQL_table_name command at the Hive prompt to see the auto-created external table. The output helps you see how to format the cql3.partition.key in your custom external table. For example, the output of a table having following CQL composite partition key, has the 'cql3.partition.key'='key,event_id' Hive property syntax:
PRIMARY KEY ((key, event_id), num_responses)

Required storage handler

Also required in the external table definition is the CQL storage handler: org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler. The storage handler accesses and stores Cassandra data back to Cassandra.

About the cassandra.input.split.size

The cassandra.input.split.size property configures the number of CQL partitions processed per mapper (64k rows per split). The default is 64 * 1024. If your tables have large partitions (many distinct values of clustering columns for the same partitioning key), do not use the default. Use a lower setting.

Partitioner use by Hive

You do not need to specify cassandra.partitioner. Your configured partitioner is used by Hive. For example, Hive uses this property value if you use the Cassandra 2.1 default partitioner:

"cassandra.partitioner" = "org.apache.cassandra.dht.Murmur3Partitioner"

Creating or altering CQL data from Hive and MapReduce performance show examples of using some of these properties.

Optional native protocol properties 

DataStax Enterprise supports the following optional properties for the native protocol.
  • cassandra.input.native.port
  • cassandra.input.native.core.connections.per.host
  • cassandra.input.native.max.connections.per.host
  • cassandra.input.native.min.simult.reqs.per.connection
  • cassandra.input.native.max.simult.reqs.per.connection
  • cassandra.input.native.connection.timeout
  • cassandra.input.native.read.connection.timeout
  • cassandra.input.native.receive.buffer.size
  • cassandra.input.native.send.buffer.size
  • cassandra.input.native.solinger
  • cassandra.input.native.tcp.nodelay
  • cassandra.input.native.reuse.address
  • cassandra.input.native.keep.alive
  • cassandra.input.native.auth.provider
  • cassandra.input.native.ssl.trust.store.path
  • cassandra.input.native.ssl.key.store.path
  • cassandra.input.native.ssl.trust.store.password
  • cassandra.input.native.ssl.key.store.password
  • cassandra.input.native.ssl.cipher.suites