Query database data using Apache Spark SQL in Java
Java applications that query table data using Spark SQL first need an instance of org.apache.spark.sql.SparkSession.
The Apache Spark™ session object is used to connect to DataStax Enterprise (DSE).
Create the Spark session instance using the builder interface:
SparkSession spark = SparkSession .builder() .appName("My application name") .config("<option name>", "<option value>") .master("dse://1.1.1.1?connection.host=1.1.2.2,1.1.3.3") .getOrCreate();
After the Spark session instance is created, you can use it to create a DataFrame instance from the query. Queries are executed by calling the SparkSession.sql method.
DataFrame employees = spark.sql("SELECT * FROM company.employees"); employees.registerTempTable("employees"); DataFrame managers = spark.sql("SELECT name FROM employees WHERE role = 'Manager' ");
The returned DataFrame object supports the standard Spark operations.
employees.collect();