INSERT INTO SELECT statement

DataStax Enterprise supports the INSERT INTO SELECT statement in Hive.

DataStax Enterprise supports the INSERT INTO SELECT statement in Hive. You set a TBL and SERDE property, and use INSERT INTO SELECT to copy data from one table and insert it into another, or the same, table.

Supported TBL and SERDE properties include the following SERDE property:

cql3.update.columns

You use cql3.update.columns in conjunction with the CQL output query property, cql3.output.query.

The following example shows how to configure these properties and use the INSERT INTO SELECT statement in Hive to insert selective columns from a table into another row of the same Cassandra table. The SELECT statement requires values for each column in the target table. Using fake values satisfies this requirement.

Procedure

  1. Start cqlsh and create a Cassandra keyspace and table.
    cqlsh> CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
    cqlsh> USE mykeyspace;
    cqlsh> CREATE TABLE mytable (a INT PRIMARY KEY, b INT, c INT, d INT);
    cqlsh> INSERT INTO mytable (a, b, c, d) VALUES (1, 2, 3, 4);
  2. Start the Hive client.
  3. In Hive, use the auto-created database and external table, and select all the data in the table.
    hive> USE mykeyspace;
    hive> SELECT * FROM mytable;
    Output is:
    OK 1 2 3 4 Time taken: 0.138 seconds, Fetched: 1 row(s)
  4. In Hive, alter the external table to configure the prepared statement as the value of the Hive CQL output query. The prepared statement in this example takes values inserted into columns a and b in mytable and maps them to columns b and a, respectively, for insertion into the new row.
    hive> ALTER TABLE mytable SET TBLPROPERTIES ('cql3.output.query' = 'update mykeyspace.mytable set b = ? where a = ?');
    
    hive> ALTER TABLE mytable SET SERDEPROPERTIES ('cql3.update.columns' = 'b,a');
  5. In Hive, execute an INSERT INTO SELECT statement to insert a row of data into mytable. For example, use 4 and 9 as the values to insert into the first two positions (a, b) of the row. The CQL output query will reverse these positions. Use two type-compatible fake values in addition to the values 4 and 9 that you want to insert. In this example, the fake values are an int, 9999, and a column name, d.
    hive> INSERT INTO TABLE mytable SELECT 4, 9, 9999, d FROM mytable;
    The MapReduce job runs:
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    . . .
    MapReduce Jobs Launched:
    Job 0: Map: 2   HDFS Read: 0 HDFS Write: 0 SUCCESS
    Total MapReduce CPU Time Spent: 0 msec
    OK
    Time taken: 31.867 seconds
  6. Check that 4 and 9, and only those values, were inserted:
    hive> SELECT * FROM mytable;
    The fake values are inserted as NULL and only the values specified by the CQL output query are inserted. The output query mapped 4 to column b and 9 to column a.
    OK
    1        2       3       4
    9        4       NULL    NULL
    Time taken: 0.131 seconds, Fetched: 2 row(s)