Using the Apache Spark™ SQL Thrift server
The Spark SQL Thrift server uses JDBC and ODBC interfaces for client connections to the database.
When reading or writing large amounts of data, DataStax recommends using DataFrames to enable the use of the Spark Cassandra Connector and the benefits of the tuning parameters that come with it.
Procedure
-
If you are using Kerberos authentication, in the
hive-site.xml
file, configure your authentication credentials for the Spark SQL Thrift server.<property> <name>hive.server2.authentication.kerberos.principal</name> <value>thriftserver/_HOST@EXAMPLE.COM</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/etc/dse/dse.keytab</value> </property>
Ensure that you use the
hive-site.xml
file in the Spark directory.
Where is the hive-site.xml
file?
The location of the hive-site.xml
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
-
Start DataStax Enterprise with Spark enabled as a service or in a standalone installation.
-
Start the server by entering the
dse spark-sql-thriftserver
start command as a user with permissions to write to the Spark directories.To override the default settings for the server, pass in the configuration property using the
--hiveconf
option. See the HiveServer2 documentation for a complete list of configuration properties.dse spark-sql-thriftserver start
By default, the server listens on port 10000 on the localhost interface on the node from which it was started. You can specify the server to start on a specific port. For example, to start the server on port 10001, use the
--hiveconf hive.server2.thrift.port=10001
option. You can configure the port and bind address inresources/spark/conf/spark-env.sh: HIVE_SERVER2_THRIFT_PORT, HIVE_SERVER2_THRIFT_BIND_HOST
.dse spark-sql-thriftserver start --hiveconf hive.server2.thrift.port=10001
You can specify general Spark configuration settings by using the
--conf
option.dse spark-sql-thrift-server start --conf spark.cores.max=4
-
Use DataFrames to read and write large volumes of data. For example, to create the
table_a_cass_df
table that uses a DataFrame while referencingtable_a
:CREATE TABLE table_a_cass_df using org.apache.spark.sql.cassandra OPTIONS (table "table_a", keyspace "ks")
With DataFrames, compatibility issues exist with
UUID
andInet
types when inserting data with the JDBC driver. -
Use the Spark Cassandra Connector tuning parameters to optimize reads and writes.
-
To stop the server, enter the dse
spark-sql-thriftserver
stop command.dse spark-sql-thriftserver stop
Next steps
You can now connect your application by using JDBC to the server at the URI: jdbc:hive2://hostname:port number
, using ODBC, or use dse spark-beeline
.