Getting started with the Spark Cassandra Connector Java API

The Spark Cassandra Connector Java API allows you to create Java applications that use Spark to analyze Cassandra data.

The Spark Cassandra Connector Java API allows you to create Java applications that use Spark to analyze Cassandra data. See the Spark Cassandra Connector Java Doc on GitHub. See the component versions for the latest version of the Spark Cassandra Connector used by DataStax Enterprise.

Using the Java API in SBT build files 

Add the following library dependency to the build.sbt or other SBT build file.

libraryDependencies + =" com.datastax.spark " %%" spark-cassandra-connector" %" 1.6.2 " withSources() withJavadoc()

For example project templates, see https://github.com/datastax/SparkBuildExamples

Using the Java API in Maven build files 

Add the following dependencies to the pom.xml file:

<dependencies>
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector</artifactId>
        <version>1.6.2</version>

    </dependency>
    ...
</dependencies>

To use the helper classes included in dse-spark.jar in your applications, copy dse-spark.jar to project_directory/lib and add the following dependency to your pom.xml file:

<dependency>
    <groupId>com.datastax</groupId>
    <artifactId>dse</artifactId>
    <version>version number</version>
    <scope>system</scope>
    <systemPath>${project.basedir}/lib/dse-spark-version number.jar</systemPath>
</dependency>

Alternately, you can manually install dse-spark.jar in your local repository.

mvn install:install-file -Dfile=path/dse-version number.jar -DgroupId=com.datastax -DartifactId=dse -Dversion=version number -Dpackaging=jar

And then add the dependency to pom.xml:

<dependency>
    <groupId>com.datastax</groupId>
    <artifactId>dse</artifactId>
    <version>version number</version>
</dependency>

For example project templates, see https://github.com/datastax/SparkBuildExamples

Accessing Cassandra data in Scala applications 

To perform Spark actions on Cassandra table data, you first obtain a RDD object. To create the RDD object, create a Spark configuration object, which is then used to create a Spark context object.

import com.datastax.spark.connector._
val conf = new SparkConf(true)
   .set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext("spark://127.0.0.1:7077", "test", conf)
val rdd = sc.cassandraTable("my_keyspace", "my_table")

To save data to Cassandra in Scala applications, use the saveToCassandra method, passing in the keyspace, table, and mapping information.

val collection = sc.parallelize(Seq(("key3", 3), ("key4", 4)))
collection.saveToCassandra("my_keyspace", "my_table", SomeColumns("key", "value"))

Accessing Cassandra data in Java applications 

To perform Spark actions on Cassandra table data, you first obtain a CassandraJavaRDD object, a subclass of the JavaRDD class. The CassandraJavaRDD is the Java language equivalent of the CassandraRDD object used in Scala applications.

To create the CassandraJavaRDD object, create a Spark configuration object, which is then used to create a Spark context object.

SparkConf conf = new SparkConf()
                .setAppName( "My application");
SparkContext sc = new SparkContext(conf);
The default location of the dse-spark-version.jar file depends on the type of installation:
Installer-Services and Package installations /usr/share/dse/dse-spark-version.jar
Installer-No Services and Tarball installations install_location/lib/dse-spark-version.jar

Use the static methods of the com.datastax.spark.connector.japi.CassandraJavaUtil class to get and manipulate CassandraJavaRDD instances. To get a new CassandraJavaRDD instance, call one of the javaFunctions methods in CassandraJavaUtil, pass in a context object, and then call the cassandraTable method and pass in the keyspace, table name, and mapping class.

JavaRDDstring cassandraRdd = CassandraJavaUtil.javaFunctions(sc)
        .cassandraTable("my_keyspace", "my_table", .mapColumnTo(String.class))
        .select("my_column");

Mapping Cassandra column data to Java types 

You can specify the Java type of a single column from a table row by specifying the type in when creating the CassandraJavaRDD<T> instance and calling the mapColumnTo method and passing in the type. Then call the select method to set the column name in Cassandra.

JavaRDD<Integer> cassandraRdd = CassandraJavaUtil.javaFunctions(sc)
        .cassandraTable("my_keyspace", "my_table", .mapColumnTo(Integer.class))
        .select("column1");

JavaBeans classes can be mapped using the mapRowTo method. The JavaBeans property names should correspond to the column names following the default mapping rules. For example, the firstName property will map by default to the first_name column name.

JavaRDD<Person> personRdd = CassandraJavaUtil.javaFunctions(sc)
                .cassandraTable("my_keyspace", "my_table", mapRowTo(Person.class));

CassandraJavaPairRDD<T, T> instances are extensions of the JavaPairRDD class, and have mapping readers for rows and columns similar to the previous examples. These pair RDDs typically are used for key/value pairs, where the first type is the key and the second type is the value.

When mapping a single column for both the key and the value, call mapColumnTo and specify the key and value types, then the select method and pass in the key and value column names.

CassandraJavaPairRDD<Integer, String> pairRdd = CassandraJavaUtil.javaFunctions(sc)
        .cassandraTable("my_keyspace", "my_table", mapColumnTo(Integer.class), mapColumnTo(String.class))
        select("id", "first_name");

Use the mapRowTo method to map row data to a Java type. For example, to create a pair RDD instance with the primary key and then a JavaBeans object:

CassandraJavaPairRDD<Integer, Person> idPersonRdd = CassandraJavaUtil.javaFunctions(sc)
        .cassandraTable("my_keyspace", "my_table", mapColumnTo(Integer.class), mapRowTo(Person.class))
        .select("id", "first_name", "last_name", "birthdate", "email");

Saving data to Cassandra in Java applications 

To save data from an RDD to Cassandra call the writerBuilder method on the CassandraJavaRDD instance, passing in the keyspace, table name, and optionally type mapping information for the column or row.

CassandraJavaUtil.javaFunctions(personRdd)
                .writerBuilder("my_keyspace", "my_table", mapToRow(Person.class)).saveToCassandra();