Using Pig (deprecated)

DataStax Enterprise includes a Cassandra File System (CFS) enabled Apache Pig Client to provide a high-level programming environment for MapReduce coding.

Hadoop is deprecated for use with DataStax Enterprise. DSE Hadoop and BYOH (Bring Your Own Hadoop) are deprecated. Pig is also deprecated and will be removed when Hadoop is removed.

DataStax Enterprise includes a Cassandra File System (CFS) enabled Apache Pig Client. Pig is a high-level programming environment for MapReduce coding. You can explore big data sets using the Pig Latin data flow language for programmers. Relations, which are similar to tables, are constructed of tuples, which correspond to the rows in a table. Unlike a relational database table, Pig relations do not require every tuple to contain the same number of fields. Fields in the same position (column) need not be of the same type. Using Pig, you can devise logic for data transformations, such as filtering data and grouping relations. The transformations occur during the MapReduce phase.

Job Trackers are managed automatically.

Pig programs are compiled into MapReduce jobs, executed in parallel by Hadoop, and run in a distributed fashion on a local or remote cluster.

Support for TTL

You can set the TTL (time to live) on Pig data. You use the cql:// URL, which includes a prepared statement shown in step 10 of the library demo.

Support for CQL collections

Pig in DataStax Enterprise supports CQL collections. Pig-supported types must be used.