Using AlwaysOn SQL service
AlwaysOn SQL is a high availability service that responds to SQL queries from JDBC and ODBC applications.
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | installation_location/resources/cassandra/conf/cassandra.yaml |
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
spark-alwayson-sql.conf
For use with AlwaysOn SQL, the default location of the spark-alwayson-sql.conf file is:Package installations | /etc/dse/spark/spark-alwayson-sql.conf |
Tarball installations | installation_location/resources/spark/conf/spark-alwayson-sql.conf |
AlwaysOn SQL is a high availability service that responds to SQL queries from JDBC and ODBC applications. By default, AlwaysOn SQL is disabled. It is built on top of the Spark SQL Thriftserver, but provides failover and caching between instances so there is no single point of failure. AlwaysOn SQL provides enhanced security, leveraging the same user management as the rest of DSE, executing queries to the underlying database as the user authenticated to AlwaysOn SQL.
- A running datacenter with DSE Analytics nodes enabled.
- Enabled AlwaysOn SQL on every Analytics node in the datacenter.
- Modified the replication factor for all Analytics nodes, if necessary.
- Set the
native_transport_address
in cassandra.yaml to an IP address accessible by the AlwaysOn SQL clients. This address depends on your network topology and deployment scenario. - Configured AlwaysOn SQL for security, if authentication is enabled.
When AlwaysOn SQL is enabled within an Analytics datacenter, all nodes within the datacenter must have AlwaysOn SQL enabled. Use dsetool ring to find which nodes in the datacenter are Analytics nodes.
The dse client-tool alwayson-sql
command controls the server. The command
works on the local datacenter unless you specify the datacenter with the --dc
option:
dse client-tool alwayson-sql --dc datacenter_name command
Enabling AlwaysOn SQL
Set enabled
to true
and uncomment the AlwaysOn SQL options in dse.yaml
.
Configuring AlwaysOn SQL
The
alwayson_sql_options
section in
dse.yaml, described in detail at AlwaysOn SQL options, has options for setting the ports, timeout
values, log location, and other Spark or Hive configuration settings. Additional
configuration options are located in
spark-alwayson-sql.conf.
AlwaysOn SQL binds to the native_transport_address
in
cassandra.yaml.
If you have changed some configuration settings in dse.yaml while AlwaysOn SQL is running, you can have the server pick up the new configuration by entering:
dse client-tool alwayson-sql reconfig
reconfig
:reserve_port_wait_time_ms
alwayson_sql_status_check_wait_time_ms
log_dsefs_dir
runner_max_errors
Changing other options requires a restart
, except for the
enabled
option. Enabling or disabling AlwaysOn SQL requires restarting DSE.
The spark-alwayson-sql.conf file contains Spark and Hive
settings as properties. When AlwaysOn SQL is started,
spark-alwayson-sql.conf is scanned for Spark properties,
similar to other Spark applications started with dse spark-submit
.
Properties that begin with spark.hive
are submitted as properties using
--hiveconf
, removing the spark.
prefix.
For example, if spark-alwayson-sql.conf has the following setting:
spark.hive.server2.table.type.mapping CLASSIC
That setting will be converted to --hiveconf
hive.server2.table.type.mapping=CLASSIC
when AlwaysOn SQL is started.
Configuring AlwaysOnSQL in a DSE Analytics Solo datacenter
If AlwaysOn SQL is used in a DSE Analytics Solo datacenter, modify
spark-alwayson-sql.conf to configure Spark with the DSE
Analytics Solo datacenters. In the following example, the transactional datacenter name is
dc0
and the DSE Analytics Solo datacenter is dc1
.
Under spark.master
set the Spark URI to the connect to the DSE Analytics
Solo datacenter.
spark.master=dse://?connection.local_dc=dc1
Add the spark.cassandra.connection.local_dc
property to
spark-alwayson-sql.conf and set it to the name of the
transactional datacenter.
spark.cassandra.connection.local_dc=dc0
Starting and stopping AlwaysOn SQL
If you have enabled AlwaysOn SQL, it will start when the cluster is started. If AlwaysOn SQL is enabled and DSE is restarted, AlwaysOn SQL will be started regardless of the previous state of AlwaysOn SQL. You only need to explicitly start the server if it has been stopped, for example for a configuration change.
To start AlwaysOn SQL service:
dse client-tool alwayson-sql start
To start the server on a specific datacenter, specify the datacenter name with the
--dc
option:
dse client-tool alwayson-sql --dc dc-west start
To completely stop AlwaysOn SQL service:
dse client-tool alwayson-sql stop
The server must be manually started after issuing a stop
command.
dse client-tool alwayson-sql restart
Checking the status of AlwaysOn SQL
To find the status of AlwaysOn SQL issue a status
command using
dse-client-tool
.
dse client-tool alwayson-sql status
You can also view the status in a web browser by going to http://node name
or IP address:AlwaysOn SQL web UI port
. By
default, the port is 9077. For example, if 10.10.10.1 is the IP address of an Analytics node
with AlwaysOn SQL enabled, navigate to http://10.10.10.1:9077
.
The returned status is one of:
RUNNING
: the server is running and ready to accept client requests.STOPPED_AUTO_RESTART
: the server is being started but is not yet ready to accept client requests.STOPPED_MANUAL_RESTART
: the server was stopped with either astop
orrestart
command. If the server was issued arestart
command, the status will be changed toSTOPPED_AUTO_RESTART
as the server starts again.STARTING
: the server is actively starting up but is not yet ready to accept client requests.
Caching tables within Spark SQL queries
To increase performance, you can specify tables to be cached into RAM using the
CACHE TABLE
directive. Permanent cached tables will be recached on server
restart.
You can cache an existing table by issuing a CACHE TABLE
Spark SQL command
through a client:
CACHE TABLE keyspace_name.table_name;
CACHE TABLE keyspace_name.table_name AS select statement;
The temporary cache table is only valid for the session in which it was created, and will not be recreated on server restart.
Create a permanent cache table using the CREATE CACHE TABLE
directive and
a SELECT
query:
CREATE CACHE TABLE keyspace_name.table_name AS select_statement;
The table cache can be destroyed using the UNCACHE TABLE
and CLEAR
CACHE
directives.
UNCACHE TABLE keyspace_name.table_name;
The CLEAR CACHE
directive removes the table cache.
CLEAR CACHE;
Issuing DROP TABLE
will remove all metadata including the table cache.