Saving a Pig relation to Cassandra

The Pig STORE command pushes data from a Pig relation to Cassandra through the CqlNativeStorage handler.

The Pig STORE command pushes data from a Pig relation to Cassandra through the CqlStorage handler:

STORE <relation_name> INTO 'cql://<keyspace>/<column_family>?<prepared statement>'
  USING CqlStorage();

URL format for CqlStorage

The url format for CqlStorage is:

cql://[username:password@]<keyspace>/<columnfamily>[?
  [page_size=<size>]
  [&columns=<col1,col2>]
  [&output_query=<prepared_statement_query>]
  [&where_clause=<clause>]
  [&split_size=<size>]
  [&partitioner=<partitioner>]
  [&use_secondary=true|false]]
  [&init_address=<host>]
  [&rpc_port=<port>]]
Where:
  • page_size -- the number of rows per page
  • columns -- the select columns of CQL query
  • output_query -- the CQL query for writing in a prepared statement format
  • where_clause -- the where clause on the index columns, which needs url encoding
  • split_size -- number of rows per split
  • partitioner -- Cassandra partitioner
  • use_secondary -- to enable pig filter partition push down
  • init_address -- the IP address of the target node
  • rpc_port -- the listen address of the target node

Store schema

The input schema for Store is:

(value, value, value)

where each value schema has the name of the column and value of the column value.

The output schema for Store is:

(((name, value), (name, value)), (value ... value), (value ... value))

where the first tuple is the map of partition key and clustering columns. The rest of the tuples are the list of bound values for the output in a prepared CQL query.