Automatic Spark Master election
Spark Master elections are automatically managed, and do not require any manual configuration.
DSE Analytics datacenters communicate with each other to elect one of the nodes as the Spark Master and another as the reserve Master. The Master keeps track of each Spark Worker and application, storing the information in a system table. If the Spark Master node fails, the reserve Master takes over and a new reserve Master is elected from the remaining Analytics nodes.
Each Analytics datacenter elects its own master.
For dsetool commands and options, see dsetool.
Determining the Spark Master address
You do not need to specify the Master address when configuring or using Spark with DSE Analytics. Configuring applications with a valid URL is sufficient for DSE to connect to the Master node and run the application. The following commands give information about the Spark configuration of DSE:
-
To view the URL used to configure Spark applications:
dse client-tool spark master-addressdse://10.200.181.62:9042?connection.local_dc=Analytics;connection.host=10.200.181.63; -
To view the current address of the Spark Master in this datacenter:
dse client-tool spark leader-address10.200.181.62 -
Workloads for Spark Master are flagged as
Workload: Analytics(SM).dsetool ringAddress DC Rack Workload Graph Status State Load Owns Token Health [0,1] 0 10.200.181.62 Analytics rack1 Analytics(SM) no Up Normal 111.91 KiB ? -9223372036854775808 0.10 -
Query the
dse_leases.leasestable to list all the masters from each data center with Analytics nodes:select * from dse_leases.leases ;name | dc | duration_ms | epoch | holder -------------------+----------------------+-------------+---------+--------------- Leader/master/5.1 | Analytics | 30000 | 805254 | 10.200.176.42 Leader/master/5.1 | SearchGraphAnalytics | 30000 | 1300800 | 10.200.176.45 Leader/master/5.1 | SearchAnalytics | 30000 | 7 | 10.200.176.44
Ensure that the replication factor is configured correctly for the dse_leases keyspace
If the dse_leases keyspace is not properly replicated, the Spark Master might not be elected.
|
Every time you add a new datacenter, you must manually increase the replication factor of the |
The initial node in a multi datacenter has a replication factor of 1 for the dse_leases keyspace.
For new datacenters, the first node is created with the dse_leases keyspace with an replication factor of 1 for that datacenter.
However, any datacenters that you add have a replication factor of 0 and require configuration before you start DSE Analytics nodes.
You must change the replication factor of the dse_leases keyspace for multiple analytics datacenters.
See Setting the replication factor for analytics keyspaces.
Monitoring the lease subsystem
All changes to lease holders are recorded in the dse_leases.logs table.
Most of the time, you do not want to enable logging.
-
To turn on logging, ensure that the
lease_metrics_optionsis enabled in thedse.yamlfile:lease_metrics_options: enabled:true ttl_seconds: 604800Where is the
dse.yamlfile?The location of the
dse.yamlfile depends on the type of installation:Installation Type Location Package installations + Installer-Services installations
/etc/dse/dse.yamlTarball installations + Installer-No Services installations
<installation_location>/resources/dse/conf/dse.yaml -
Look at the
dse_leases.logstable:select * from dse_leases.logs ;name | dc | monitor | at | new_holder | old_holder -------------------+-----+---------------+---------------------------------+---------------+------------ Leader/master/5.1 | dc1 | 10.200.180.44 | 2018-05-17 00:45:02.971000+0000 | 10.200.180.44 | Leader/master/5.1 | dc1 | 10.200.180.49 | 2018-05-17 02:37:07.381000+0000 | 10.200.180.49 | -
When the
lease_metrics_optionis enabled, you can examine the acquire, renew, resolve, and disable operations. Most of the time, these operations should complete in 100 ms or less:select * from dse_perf.leases ;name | dc | monitor | acquire_average_latency_ms | acquire_latency99ms | acquire_max_latency_ms | acquire_rate15 | disable_average_latency_ms | disable_latency99ms | disable_max_latency_ms | disable_rate15 | renew_average_latency_ms | renew_latency99ms | renew_max_latency_ms | renew_rate15 | resolve_average_latency_ms | resolve_latency99ms | resolve_max_latency_ms | resolve_rate15 | up | up_or_down_since -------------------+-----+---------------+----------------------------+---------------------+------------------------+----------------+----------------------------+---------------------+------------------------+----------------+--------------------------+-------------------+----------------------+--------------+----------------------------+---------------------+------------------------+----------------+------+--------------------------------- Leader/master/5.1 | dc1 | 10.200.180.44 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 24 | 100 | 100 | 0 | 8 | 26 | 26 | 0 | True | 2018-05-03 19:30:38.395000+0000 Leader/master/5.1 | dc1 | 10.200.180.49 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 32 | 32 | 0 | True | 2018-05-03 19:30:55.656000+0000 -
If the log warnings and errors do not contain relevant information, edit the
logback.xmlfile and add:<logger name="com.datastax.bdp.leasemanager" level="DEBUG">Where is the
logback.xmlfile?The location of the
logback.xmlfile depends on the type of installation:Installation Type Location Package installations + Installer-Services installations
/etc/dse/cassandra/logback.xmlTarball installations + Installer-No Services installations
<installation_location>/resources/cassandra/conf/logback.xml -
Restart the node for the debugging settings to take effect.
Troubleshooting
Perform these various lease holder troubleshooting activities before you contact DataStax Support.
-
Verify the workload status
Run the
dsetoolring command:dsetool ringIf the replication factor is inadequate or if the replicas are down, the output of the
dsetool ringcommand contains a warning:Address DC Rack Workload Graph Status State Load Owns Token Health [0,1] 0 10.200.178.232 SearchGraphAnalytics rack1 SearchAnalytics yes Up Normal 153.04 KiB ? -9223372036854775808 0.00 10.200.178.230 SearchGraphAnalytics rack1 SearchAnalytics(SM) yes Up Normal 92.98 KiB ? 0 0.000If the automatic Job Tracker or Spark Master election fails, verify that an appropriate replication factor is set for the
dse_leaseskeyspace. -
Use
cqlshcommands to verify the replication factor of the analytics keyspaces-
Describe the
dse_leaseskeyspace:DESCRIBE KEYSPACE dse_leases;CREATE KEYSPACE dse_leases WITH replication = {'class': 'NetworkTopologyStrategy', 'Analytics1': '1'} AND durable_writes = true; -
Increase the replication factor of the
dse_leaseskeyspace:ALTER KEYSPACE dse_leases WITH replication = {'class': 'NetworkTopologyStrategy', 'Analytics1': '3', 'Analytics2':'3'} ; -
Run
nodetool repair.
-