CQL pushdown filter

Optimize the processing of the data by moving filtering expressions in Pig as close to the data source as possible.

DataStax Enterprise includes a CqlStorage URL option, use_secondary. Setting the option to true optimizes the processing of the data by moving filtering expressions in Pig as close to the data source as possible. To use this capability:
  • Create an index for the Cassandra table.

    For Pig pushdown filtering, the secondary index must have the same name as the column being indexed.

  • Include the use_secondary option with a value of true in the url format for the storage handler. The option name reflects the term that used to be used for a Cassandra index: secondary index. For example:

    newdata = LOAD 'cql://ks/cf_300000_keys_50_cols?use_secondary=true'
                USING CqlStorage();              -- DataStax Enterprise 4.5.0 - 4.5.1
                
    newdata = LOAD 'cql://ks/cf_300000_keys_50_cols?use_secondary=true'
                USING CqlNativeStorage();              -- DataStax Enterprise 4.5.2