Importing a text file into a table
This example shows how to use Spark to import a local or DSEFS based text file into an existing table.
This example shows how to use Spark to import a local or DSEFS based text file into
                an existing table. You use the saveToCassandra method present in
                the Spark RDDs to save an arbitrary RDD to the database.
Procedure
- 
                Create a keyspace and a table in the database. For example, use
                        
cqlsh.CREATE KEYSPACE int_ks WITH replication = {'class': 'NetworkTopologyStrategy', 'Analytics':1}; USE int_ks; CREATE TABLE int_compound ( pkey int, ckey1 int, data1 int , PRIMARY KEY (pkey,ckey1)); - 
                Insert data into the table
                
INSERT INTO int_compound ( pkey, ckey1, data1 ) VALUES ( 1, 2, 3 ); INSERT INTO int_compound ( pkey, ckey1, data1 ) VALUES ( 2, 3, 4 ); INSERT INTO int_compound ( pkey, ckey1, data1 ) VALUES ( 3, 4, 5 ); INSERT INTO int_compound ( pkey, ckey1, data1 ) VALUES ( 4, 5, 1 ); INSERT INTO int_compound ( pkey, ckey1, data1 ) VALUES ( 5, 1, 2 );
 - 
                Create a text file named normalfill.csv that contains this
                    data.
                
6,7,8 7,8,6 8,6,7 - 
                Put the CSV file into DSEFS. For example:
                
dse fs "put file://mypath/normalfill.csv dsefs:/"
 - Start the Spark shell.
 - 
                Verify that Spark can access the 
int_kskeyspace::showSchema int_ks
======================================== Keyspace: int_ks ======================================== Table: int_compound ---------------------------------------- - pkey : Int (partition key column) - ckey1 : Int (clustering column) - data1 : Int
int_ksappears in the list of keyspaces. - 
                Read in the file, splitting it on the comma delimiter. Transform each element
                    into an Integer.
                
val normalfill = sc.textFile("/normalfill.csv").map(line => line.split(",").map(_.toInt));normalfill: org.apache.spark.rdd.RDD[Array[Int]] = MappedRDD[2] at map at console:22
Alternatively, read in the file from the local file system.val file = sc.textFile("file:///local-path/normalfill.csv")file: org.apache.spark.rdd.RDD[String] = MappedRDD[4] at textFile at console:22
 - 
                Check that Spark can find and read the CSV file.
                
normalfill.take(1);
res2: Array[Array[Int]] = Array(Array(6, 7, 8))
 - 
                Save the new data to the database.
                
normalfill.map(line => (line(0), line(1), line(2))).saveToCassandra( "int_ks", "int_compound", Seq("pkey", "ckey1", "data1"))The step produces no output. - 
                Check that the data was saved using 
cqlsh.SELECT * FROM int_ks.int_compound;
pkey | ckey1 | data1 ------+-------+------- 5 | 1 | 2 1 | 2 | 3 8 | 6 | 7 2 | 3 | 4 4 | 5 | 1 7 | 8 | 6 6 | 7 | 8 3 | 4 | 5 (8 rows) 
