Querying database data using Apache Spark™ SQL in Java

Java applications that query table data using Spark SQL first need an instance of org.apache.spark.sql.SparkSession.

The default location of the dse-spark-version.jar file depends on the type of installation:

Installation Type Location

Installation Type	Location
Package installations + Installer-Services installations	`/usr/share/dse/dse-spark-version.jar`
Tarball installations + Installer-No Services installations	`<installation_location>/lib/dse-spark-version.jar`

Package installations + Installer-Services installations

/usr/share/dse/dse-spark-version.jar

Tarball installations + Installer-No Services installations

<installation_location>/lib/dse-spark-version.jar

The Spark session object is used to connect to DataStax Enterprise.

Create the Spark session instance using the builder interface:

SparkSession spark = SparkSession
    .builder()
    .appName("My application name")
    .config("option name", "option value")
    .master("dse://1.1.1.1?connection.host=1.1.2.2,1.1.3.3")
    .getOrCreate();

After the Spark session instance is created, you can use it to create a DataFrame instance from the query. Queries are executed by calling the SparkSession.sql method.

DataFrame employees = spark.sql("SELECT * FROM company.employees");
employees.registerTempTable("employees");
DataFrame managers = spark.sql("SELECT name FROM employees WHERE role = 'Manager' ");

The returned DataFrame object supports the standard Spark operations.

employees.collect();

Querying database data using Apache Spark™ SQL in Java

Was this helpful?

Give Feedback