Querying database data using Apache Spark™ SQL in Java
Java applications that query table data using Spark SQL first need an instance of org.apache.spark.sql.SparkSession
.
The default location of the dse-spark-version.jar
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
The Spark session object is used to connect to DataStax Enterprise.
Create the Spark session instance using the builder interface:
SparkSession spark = SparkSession
.builder()
.appName("My application name")
.config("option name", "option value")
.master("dse://1.1.1.1?connection.host=1.1.2.2,1.1.3.3")
.getOrCreate();
After the Spark session instance is created, you can use it to create a DataFrame
instance from the query.
Queries are executed by calling the SparkSession.sql
method.
DataFrame employees = spark.sql("SELECT * FROM company.employees");
employees.registerTempTable("employees");
DataFrame managers = spark.sql("SELECT name FROM employees WHERE role = 'Manager' ");
The returned DataFrame
object supports the standard Spark operations.
employees.collect();