Using TBLPROPERTIES and SERDEPROPERTIES
In an external table definition, the TBLPROPERTIES clause maps the Hive database to a CQL table and can include MapReduce properties, Cassandra database configuration, and native protocol properties for the table. The SERDEPROPERTIES clause specifies the properties used when serializing/deserializing data passed between the Hive table and Cassandra.
In an external table definition, the TBLPROPERTIES clause maps the Hive database to a CQL table and can include MapReduce properties, Cassandra database configuration, and native protocol properties for the table. The SERDEPROPERTIES clause specifies the properties used when serializing/deserializing data passed between the Hive table and Cassandra. You can add a WITH SERDEPROPERTIES clause to map meaningful column names in Hive to the Cassandra partition key, column names, and column values. You can change these properties on the fly. Using the Hive SET command, you can configure properties in the hive session. The settings become effective for the next query.
The following table lists general properties used in the TBLPROPERTIES or SERDEPROPERTIES clause or both. The subsequent section lists additional, optional properties for use with the DataStax Java Driver. The TBL/SERDE column of the following table lists how to declare properties in the table definition, as a TBLPROPERTIES (TBL), a SERDEPROPERTIES (SERDE) or both.
General Property | TBL/SERDE | Description |
---|---|---|
cassandra.cf.name | both | Cassandra table name |
cassandra.columns.mapping | both | Mapping of Hive to legacy Cassandra columns |
cassandra.consistency.level | both | Consistency level - default ONE |
cassandra.cql3.type | both | CQL types |
cassandra.host | both | IP of a Cassandra node to connect to |
cassandra.input.split.size | both | MapReduce split size |
cassandra.ks.name | both | Cassandra keyspace name |
cassandra.partitioner | both | Partitioner (default = configured partitioner) |
cassandra.port | both | Cassandra RPC port - default 9160 |
cql3.output.query | TBL | A prepared statement for storing alterations to a CQL users table |
cql3.partition.key | both | CQL partition key, a comma-separated list of partition and clustering keys |
cql3.pushdown.enable | TBL | True (default) enable pushdown predicate |
cql3.update.columns | both | Used with INSERT INTO SELECT |
Required table properties
When you create an external table in Hive, you need to specify these properties:
- cassandra.ks.name
- cassandra.cf.name
Other frequently-used properties are:
- cql3.output.query
- cql3.partition.key (DataStax Enterprise 4.0.4 and later)
PRIMARY KEY ((key, event_id), num_responses)
Required storage handler
Also required in the external table definition is the CQL storage handler: org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler. The storage handler accesses and stores Cassandra data back to Cassandra.
About the cassandra.input.split.size
The cassandra.input.split.size property configures the number of CQL partitions processed per mapper (64k rows per split). The default is 64 * 1024. If your tables have large partitions (many distinct values of clustering columns for the same partitioning key), do not use the default. Use a lower setting.
Partitioner use by Hive
You do not need to specify cassandra.partitioner. Your configured partitioner is used by Hive. For example, Hive uses this property value if you use the Cassandra 2.1 default partitioner:
"cassandra.partitioner" = "org.apache.cassandra.dht.Murmur3Partitioner"
Creating or altering CQL data from Hive and MapReduce performance show examples of using some of these properties.
Optional native protocol properties
- cassandra.input.native.port
- cassandra.input.native.core.connections.per.host
- cassandra.input.native.max.connections.per.host
- cassandra.input.native.min.simult.reqs.per.connection
- cassandra.input.native.max.simult.reqs.per.connection
- cassandra.input.native.connection.timeout
- cassandra.input.native.read.connection.timeout
- cassandra.input.native.receive.buffer.size
- cassandra.input.native.send.buffer.size
- cassandra.input.native.solinger
- cassandra.input.native.tcp.nodelay
- cassandra.input.native.reuse.address
- cassandra.input.native.keep.alive
- cassandra.input.native.auth.provider
- cassandra.input.native.ssl.trust.store.path
- cassandra.input.native.ssl.key.store.path
- cassandra.input.native.ssl.trust.store.password
- cassandra.input.native.ssl.key.store.password
- cassandra.input.native.ssl.cipher.suites