DataStax Enterprise 4.0

Example: Handle a compound primary key

Work with CQL tables in Pig. The tables use compound primary keys. You create the tables in cqlsh and merge them using Pig.

This example, like the previous one, shows you how to work with CQL tables in Pig. The previous example used tables having a simple primary key. The tables in this example use compound primary keys. You create the tables in cqlsh and merge them using Pig.



Procedure

  1. Create a four-column (a, b, c, d) Cassandra table named table1 and another five-column (id, x, y, z, data) table named table2.
    cqlsh:cql3ks> CREATE TABLE table1 (
                    a int,
                    b int,
                    c text,
                    d text,
                    PRIMARY KEY (a,b,c)
                  );
    cqlsh:cql3ks> CREATE TABLE table2 (
                    id int PRIMARY KEY, 
                    x int,
                    y int,
                    z text, 
                    data text
                  );
  2. Insert data into the tables.
    cqlsh:cql3ks> INSERT INTO table1 (a, b , c , d )
                    VALUES ( 1,1,'One','match');
    cqlsh:cql3ks> INSERT INTO table1 (a, b , c , d )
                    VALUES ( 2,2,'Two','match');
    cqlsh:cql3ks> INSERT INTO table1 (a, b , c , d )
                    VALUES ( 3,3,'Three','match');
    cqlsh:cql3ks> INSERT INTO table1 (a, b , c , d )
                    VALUES ( 4,4,'Four','match');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (1,5,6,'Fix','nomatch');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (2,6,5,'Sive','nomatch');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (3,7,7,'Seven','match');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (4,8,8,'Eight','match');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (5,9,10,'Ninen','nomatch');
  3. Using Pig, add logic to load the data from the Cassandra table2 to a Pig relation. In DataStax Enterprise 4.0.4 and later, use USING CqlNativeStorage instead of USING CqlStorage.
    grunt> moredata = load 'cql://cql3ks/table2' USING CqlStorage;
  4. Convert the data to a tuple.
    grunt> insertformat = FOREACH moredata GENERATE TOTUPLE
             (TOTUPLE('a',x),TOTUPLE('b',y),
             TOTUPLE('c',z)),TOTUPLE(data);

    During the actual data processing, the data is formatted as follows:

    ((PartitionKey_Name,Value),(ClusteringKey_1_name,Value)...)(ArgValue1,ArgValue2,ArgValue3,...)
  5. Save the Pig relation to the Cassandra table1 table. The data from table 1 and table 2 will be merged.
    grunt> STORE insertformat INTO 'cql://cql3ks/table1?output_query=UPDATE%20cql3ks.table1%20SET%20d%20%3D%20%3F' USING CqlStorage;

    The cql:// URL includes a prepared statement, described later, that needs to be copied/pasted as a continuous string (no spaces or line breaks).

  6. In cqlsh, query table1 to check that the data from table1 and table2 have been merged.
    cqlsh:cql3ks> SELECT * FROM table1;
    
     a | b  | c     | d
    ---+----+-------+---------
     5 |  6 |   Fix | nomatch
     1 |  1 |   One |   match
     8 |  8 | Eight |   match
     2 |  2 |   Two |   match
     4 |  4 |  Four |   match
     7 |  7 | Seven |   match
     6 |  5 |  Sive | nomatch
     9 | 10 | Ninen | nomatch
     3 |  3 | Three |   match
Loading Search