Automatic Spark Master election
Spark Master elections are automatically managed.
logback.xml
The location of the logback.xml file depends on the type of installation:Package installations | /etc/dse/cassandra/logback.xml |
Tarball installations | installation_location/resources/cassandra/conf/logback.xml |
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
Spark Master elections are automatically managed, and do not require any manual configuration.
DSE Analytics datacenters communicate with each other to elect one of the nodes as the Spark Master and another as the reserve Master. The Master keeps track of each Spark Worker and application, storing the information in a system table. If the Spark Master node fails, the reserve Master takes over and a new reserve Master is elected from the remaining Analytics nodes.
Each Analytics datacenter elects its own master.
For dsetool
commands and options, see dsetool.
Determining the Spark Master address
- To view the URL used to configure Spark
applications:
dse client-tool spark master-address
dse://10.200.181.62:9042?connection.local_dc=Analytics;connection.host=10.200.181.63;
- To view the current address of the Spark Master in this
datacenter:
dse client-tool spark leader-address
10.200.181.62
- Workloads for Spark Master are flagged as
Workload: Analytics(SM).
dsetool ring
Address DC Rack Workload Graph Status State Load Owns Token Health [0,1] 0 10.200.181.62 Analytics rack1 Analytics(SM) no Up Normal 111.91 KiB ? -9223372036854775808 0.10
- Query the
dse_leases.leases
table to list all the masters from each data center with Analytics nodes:select * from dse_leases.leases ;
name | dc | duration_ms | epoch | holder -------------------+----------------------+-------------+---------+--------------- Leader/master/6.0 | Analytics | 30000 | 805254 | 10.200.176.42 Leader/master/6.0 | SearchGraphAnalytics | 30000 | 1300800 | 10.200.176.45 Leader/master/6.0 | SearchAnalytics | 30000 | 7 | 10.200.176.44
Ensure that the replication factor is configured correctly for the dse_leases keyspace
dse_leases
keyspace is not properly replicated, the Spark
Master might not be elected. dse_leases
keyspace for the new DSE Analytics datacenter.
If DataStax Enterprise or Spark security options are enabled on the cluster, you
must also increase the replication factor for the dse_security
keyspace across all logical datacenters.dse_leases
keyspace. For new datacenters, the first node is created with the
dse_leases
keyspace with an replication factor of 1 for that
datacenter. However, any datacenters that you add have a replication factor of 0 and
require configuration before you start DSE Analytics nodes. You must change the
replication factor of the dse_leases
keyspace for multiple
analytics datacenters. See Setting the replication factor for analytics keyspaces.Monitoring the lease subsystem
dse_leases.logs
table. Most of the time, you do not want to enable logging. - To turn on logging, ensure that the lease_metrics_options is enabled in the
dse.yaml
file:
lease_metrics_options: enabled:true ttl_seconds: 604800
- Look at the
dse_leases.logs
table:select * from dse_leases.logs ;
name | dc | monitor | at | new_holder | old_holder -------------------+-----+---------------+---------------------------------+---------------+------------ Leader/master/6.0 | dc1 | 10.200.180.44 | 2018-05-17 00:45:02.971000+0000 | 10.200.180.44 | Leader/master/6.0 | dc1 | 10.200.180.49 | 2018-05-17 02:37:07.381000+0000 | 10.200.180.49 |
- When the
lease_metrics_option
is enabled, you can examine the acquire, renew, resolve, and disable operations. Most of the time, these operations should complete in 100 ms or less:select * from dse_perf.leases ;
name | dc | monitor | acquire_average_latency_ms | acquire_latency99ms | acquire_max_latency_ms | acquire_rate15 | disable_average_latency_ms | disable_latency99ms | disable_max_latency_ms | disable_rate15 | renew_average_latency_ms | renew_latency99ms | renew_max_latency_ms | renew_rate15 | resolve_average_latency_ms | resolve_latency99ms | resolve_max_latency_ms | resolve_rate15 | up | up_or_down_since -------------------+-----+---------------+----------------------------+---------------------+------------------------+----------------+----------------------------+---------------------+------------------------+----------------+--------------------------+-------------------+----------------------+--------------+----------------------------+---------------------+------------------------+----------------+------+--------------------------------- Leader/master/6.0 | dc1 | 10.200.180.44 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 100 | 100 | 0 | 8 | 26 | 26 | 0 | True | 2018-05-03 19:30:38.395000+0000 Leader/master/6.0 | dc1 | 10.200.180.49 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 32 | 32 | 0 | True | 2018-05-03 19:30:55.656000+0000
- If the log warnings and errors do not contain relevant information, edit the
logback.xml file and
add:
<logger name="com.datastax.bdp.leasemanager" level="DEBUG">
- Restart the node for the debugging settings to take effect.
Troubleshooting
- Verify the workload status
- Run the
dsetool
ring command:dsetool ring
If the replication factor is inadequate or if the replicas are down, the output of thedsetool ring
command contains a warning:
If the automatic Job Tracker or Spark Master election fails, verify that an appropriate replication factor is set for the dse_leases keyspace.Address DC Rack Workload Graph Status State Load Owns Token Health [0,1] 0 10.200.178.232 SearchGraphAnalytics rack1 SearchAnalytics yes Up Normal 153.04 KiB ? -9223372036854775808 0.00 10.200.178.230 SearchGraphAnalytics rack1 SearchAnalytics(SM) yes Up Normal 92.98 KiB ? 0 0.000
- Use cqlsh commands to verify the replication factor of the analytics keyspaces
-
- Describe the
dse_leases
keyspace:DESCRIBE KEYSPACE dse_leases;
CREATE KEYSPACE dse_leases WITH replication = {'class': 'NetworkTopologyStrategy', 'Analytics1': '1'} AND durable_writes = true;
- Increase the replication factor of the
dse_leases
keyspace:ALTER KEYSPACE dse_leases WITH replication = {'class': 'NetworkTopologyStrategy', 'Analytics1': '3', 'Analytics2':'3'} ;
- Run nodetool repair.
- Describe the