The Spark SQL Thrift server uses a JDBC and an ODBC interface for client connections
to Cassandra.
The Spark SQL Thrift server uses JDBC and ODBC interfaces for client
connections to Cassandra.
When reading or writing large amounts of data, DataStax recommends using
Cassandra-backed DataFrames to enable the use of the Spark Cassandra Connector to leverage the
benefits of the tuning parameters that come with it.
There are two instances
of the
hive-site.xml file.
For use with Spark, the default location of the hive-site.xml file is:
Installer-Services and Package installations |
/etc/dse/spark/hive-site.xml
|
Installer-No Services and Tarball
installations |
install_location/resources/spark/conf/hive-site.xml
|
For use with Hive, the default location of the hive-site.xml file is:
Installer-Services and Package installations |
/etc/dse/hive/hive-site.xml
|
Installer-No Services and Tarball
installations |
install_location/resources/hive/conf/hive-site.xml
|
Procedure
-
In the file, configure Cassandra
authentication credentials for the Spark SQL Thrift server.
Ensure that you use the
hive-site.xml file in the Spark
directory:
Installer-Services and Package installations |
/etc/dse/spark/hive-site.xml |
Installer-No Services and Tarball installations |
install_location/resources/spark/conf/hive-site.xml
|
-
Start DataStax Enterprise with Spark enabled as a service or in a standalone
installation.
Note: To run index queries, start the node with both Spark and Hadoop enabled.
Running in this mode is experimental and not supported.
-
Start the server by entering the
dse
spark-sql-thriftserver start
command as a user with permissions to
write to the Spark directories.
To override the default settings for the server, pass in the configuration
property using the --hiveconf option. See the HiveServer2 documentation for a
complete list of configuration properties.
dse spark-sql-thriftserver start
By default, the server listens on port 10000 on the localhost interface on
the node from which it was started. You can specify the server to start on a
specific port. For example, to start the server on port 10001, use the
--hiveconf hive.server2.thrift.port=10001 option. You
can configure the port and bind address in
resources/spark/conf/spark-env.sh:
HIVE_SERVER2_THRIFT_PORT, HIVE_SERVER2_THRIFT_BIND_HOST.
dse spark-sql-thriftserver start --hiveconf hive.server2.thrift.port=10001
-
Use Cassandra-backed DataFrames to read and write large volumes of data. For
example, to create the
table_a_cass_df
table that uses a
Cassandra-backed DataFrame while referencing table_a
:
CREATE TABLE table_a_cass_df using org.apache.spark.sql.cassandra OPTIONS (table "table_a", keyspace "ks")
Note: With Cassandra-backed DataFrames, compatibility issues exist with UUID and
Inet types when inserting data with the JDBC driver.
-
Use the Spark Cassandra Connector tuning
parameters to optimize reads and writes.
-
To stop the server, enter the
dse
spark-sql-thriftserver stop
command.
dse spark-sql-thriftserver stop
What's next
You can now connect your application by using JDBC to the server at the URI:
jdbc:hive2://hostname:port
number
or use dse
spark-beeline.