CQL data access

Use the CqlNativeStorage handler with the input_cql statement or use the output_query statement that was available in earlier releases.

In DataStax Enterprise 4.0.4, to access data in CQL tables, use the CqlNativeStorage handler with the new input_cql statement or use the output_query statement that was available in earlier releases.

In DataStax Enterprise 4.0-4.0.3, to access data in CQL tables, use the CqlStorage() handler. To access data in the CassandraFS, the target keyspace and table must already exist. Data in a Pig relation can be stored in a Cassandra table, but Pig will not create the table.

The Pig LOAD function pulls Cassandra data into a Pig relation through the storage handler as shown in these examples:
  • DataStax Enterprise 4.0.4
    <pig_relation_name> = LOAD 'cql://<keyspace>/<table>' 
        USING CqlNativeStorage(); -- DataStax Enterprise 4.0.4
  • DataStax Enterprise 4.0 - 4.0.3
    <pig_relation_name> = LOAD 'cql://<keyspace>/<table>' 
        USING CqlStorage(); -- DataStax Enterprise 4.0 - 4.0.3
DataStax Enterprise supports these Pig data types:
  • int
  • long
  • float
  • double
  • boolean
  • chararray
The Pig demo examples include using the LOAD command.

LOAD schema

The LOAD Schema is:

(colname:colvalue, colname:colvalue, … )

where each colvalue is referenced by the Cassandra column name.

Accessing data using input_cql and CqlNativeStorage handler  

The input_cql statement contains the following components:
  • A SELECT statement that includes the partition key columns
  • A WHERE clause that includes the range of the columns consistent with the order in the cluster and in the following format:
    WHERE token(partitionkey) > ? and token(partitionkey) <?
  • The value of the native_port

For example, the input_cql statement before encoding might look like this:

'SELECT * FROM ks.tab where token(key) > ? and token (key) <= ?' USING CqlNativeStorage();
Append the encoded statement as an argument to the pig Load command using the ?input_cql= syntax.
x = LOAD 'cql://ks/tab?input_cql=SELECT%20*%20FROM%20ks.tab%20where%20token(key)%20%3E%20%3F%20and%20token%20(key)%20%3C%3D%20%3F' USING CqlNativeStorage();
Use an ampersand to append additional parameters. For example, to modify the port used by the Java Driver, append the following parameter and port number.
&native_port=9042

The entire migrated Pig command would look like this:

x = LOAD 'cql://ks/tab?input_cql=SELECT%20*%20FROM%20ks.tab%20where%20token(key)%20%3E%20%3F%20and%20token%20(key)%20%3C%3D%20%3F&amp;native_port=9042' USING CqlNativeStorage(); 

Optional input_cql parameters

You can use the following list of parameters with input_cql in DataStax Enterprise 4.0.4 and later as shown by the example in the last section. The ampersand must preface the parameter.
  • &native_port=<native_port>
  • &core_conns=<core_conns>
  • &max_conns=<max_conns>
  • &min_simult_reqs=<min_simult_reqs>
  • &max_simult_reqs=<max_simult_reqs>
  • &native_timeout=<native_timeout>
  • &native_read_timeout=<native_read_timeout>
  • &rec_buff_size=<rec_buff_size>
  • &send_buff_size=<send_buff_size>
  • &solinger=<solinger>
  • &tcp_nodelay=<tcp_nodelay>
  • &reuse_address=<reuse_address>
  • &keep_alive=<keep_alive>
  • &auth_provider=<auth_provider>
  • &trust_store_path=<trust_store_path>
  • &key_store_path=<key_store_path>
  • &trust_store_password=<trust_store_password>
  • &key_store_password=<key_store_password>
  • &cipher_suites=<cipher_suites>
  • &input_cql=<input_cql>

Handling special characters in the CQL 

If the input_cql or output_query to a Pig function contains special characters, you need to url-encode a prepared statement to make special characters readable by Pig.