Using the Spark SQL Thriftserver
The Spark SQL Thriftserver uses a JDBC and an ODBC interface for client connections to DSE.
The Spark SQL Thriftserver uses JDBC and ODBC interfaces for client connections to the database.
The AlwaysOn SQL service is a high-availability service built on top of the Spark SQL Thriftserver. The Spark SQL Thriftserver is started manually on a single node in an Analytics datacenter, and will not failover to another node. Both AlwaysOn SQL and the Spark SQL Thriftserver provide JDBC and ODBC interfaces to DSE, and share many configuration settings.
hive-site.xml
For use with Spark, the default location of the hive-site.xml file is:Package installations | /etc/dse/spark/hive-site.xml |
Tarball installations | installation_location/resources/spark/conf/hive-site.xml |
Procedure
-
If you are using Kerberos authentication, in the
hive-site.xml file, configure your
authentication credentials for the Spark SQL Thrift server.
<property> <name>hive.server2.authentication.kerberos.principal</name> <value>thriftserver/_HOST@EXAMPLE.COM</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/etc/dse/dse.keytab</value> </property>
Ensure that you use the hive-site.xml file in the Spark directory:- Package installations: /etc/dse/spark/hive-site.xml
- Tarball installations: installation_location/resources/spark/conf/hive-site.xml
- Start DataStax Enterprise with Spark enabled as a service or in a standalone installation.
-
Start the server by entering the dse spark-sql-thriftserver
start command as a user with permissions to write to the Spark
directories.
To override the default settings for the server, pass in the configuration property using the --hiveconf option. See the HiveServer2 documentation for a complete list of configuration properties.
dse spark-sql-thriftserver start
By default, the server listens on port 10000 on the localhost interface on the node from which it was started. You can specify the server to start on a specific port. For example, to start the server on port 10001, use the --hiveconf hive.server2.thrift.port=10001 option.
dse spark-sql-thriftserver start --hiveconf hive.server2.thrift.port=10001
You can configure the port and bind address permanently in resources/spark/conf/spark-env.sh:
export HIVE_SERVER2_THRIFT_PORT=10001 export HIVE_SERVER2_THRIFT_BIND_HOST=1.1.1.1
You can specify general Spark configuration settings by using the
--conf
option.dse spark-sql-thrift-server start --conf spark.cores.max=4
-
Use DataFrames to read and write large volumes of data. For example, to create
the
table_a_cass_df
table that uses a DataFrame while referencingtable_a
:CREATE TABLE table_a_cass_df using org.apache.spark.sql.cassandra OPTIONS (table "table_a", keyspace "ks")
Note: With DataFrames, compatibility issues exist withUUID
andInet
types when inserting data with the JDBC driver. - Use the Spark Cassandra Connector tuning parameters to optimize reads and writes.
-
To stop the server, enter the dse
spark-sql-thriftserver stop command.
dse spark-sql-thriftserver stop
What's next
You can now connect your application by using the Simba JDBC driver to the server at the URI:
jdbc:hive2://hostname:port
number
, using the Simba ODBC driver or use dse beeline.