Automatic Job Tracker and Spark Master
Job Tracker and Spark Master elections are automatically managed.
Job Tracker and Spark Master elections are automatically managed.
DSE Analytics clusters communicate with each other to elect one of the nodes as the Spark Master and another as the reserve Master.
Note: DSE requires a
QUORUM
of nodes to elect the Spark Master, so ensure
that a sufficient number of nodes in the datacenter are active to achieve a
QUORUM
. For example, if there are only 2 nodes in a datacenter and
one node is down, DSE will not be able to elect the Spark Master node since there are
not enough nodes available to achieve a QUORUM
.The Master keeps track of each Spark Worker and application, storing the information in a system table. If the Spark Master node fails, the reserve Master takes over and a new reserve Master is elected from the remaining Analytics nodes.
Spark jobs do not need to know Spark Master address. When you start a job using
dse spark-submit
the Master will be automatically discovered.
For dsetool commands and options, see dsetool.
Determining the Job Tracker and Spark Master address
Use these commands to determine the Job Tracker and Spark Master addresses:
- To view the current address of Hadoop Job Tracker in this
datacenter:
dse client-tool hadoop job-tracker-address 10.200.176.232:8012
- To view the current address of Spark Master in this
datacenter:
dse client-tool spark master-address 10.200.176.232:8012
- Workloads for Spark Master and Job Tracker
are flagged as Workload: Analytics(JT).
dsetool ring 10.200.176.232 Analytics rack1 Analytics(JT) no Up Normal 240.59 KB 33.33% 3074457345618258602 0.64
Ensure that the replication factor is configured correctly for the dse_leases keyspace
If the dse_leases keyspace is not properly replicated, the Hadoop Job Trackers and
Spark Masters might not be elected.
Important: Every time you
add a new datacenter, you must manually increase the replication factor
of the
The initial node in a multi datacenter has a replication
factor of 1 for the dse_leases
keyspace for the new DSE Analytics
datacenter. If DataStax Enterprise or Spark security options are enabled on the
cluster, you must also increase the replication factor for the
dse_security
keyspace across all logical
datacenters.dse_leases
keyspace. For new datacenters, the
first node is created with the dse_leases
keyspace with an
replication factor of 1 for that datacenter. However, any datacenters that you add
have a replication factor of 0 and require configuration before you start DSE
Analytics nodes. You must change the replication factor of the
dse_leases
keyspace for multiple analytics datacenters. See
Setting the replication
factor for analytics keyspaces.Monitoring the lease subsystem
All changes to lease holders are recorded in the dse_leases.logs table. Most of the
time, you do not want to enable logging.
- To turn on logging, ensure that the lease_metrics_options is enabled in the
dse.yaml
file:
lease_metrics_options: enabled:true ttl_seconds: 604800
- Look at the dse_leases.logs
table:
cqlsh> select * from dse_leases.logs ; name | dc | monitor | at | new_holder | old_holder ----------------------------------------------------------------------+----------- HadoopJT | Analytics | 10.200.176.231 | 2016-03-15 00:16:49+0000 | 10.200.176.231 |
- When the lease_metrics_option is enabled, you can examine the acquire,
renew, resolve, and disable operations. Most of the time, these operations
should complete in 100 ms or
less:
cqlsh> select * from dse_perf.leases ; name | dc | monitor | acquire_average_latency_ms | acquire_latency99ms | acquire_max_latency_ms | acquire_rate15 | disable_average_latency_ms | disable_latency99ms | disable_max_latency_ms | disable_rate15 | renew_average_latency_ms | renew_latency99ms | renew_max_latency_ms | renew_rate15 | resolve_average_latency_ms | resolve_latency99ms | resolve_max_latency_ms | resolve_rate15 | up | up_or_down_since ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HadoopJT | Analytics | 10.200.176.231 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 58 | 104 | 115 | 0 | True | 2016-03-16 19:31:45+0000
- If the log warnings and errors do not contain relevant information, edit the
logback.xml file and
add:
<logger name="com.datastax.bdp.leasemanager" level="DEBUG">
- Restart the node for the debugging settings to take effect.
Troubleshooting
Perform these various lease holder troubleshooting activities before you contact
DataStax Support.
- Verify the workload status
- Run the dsetool ring command:
dsetool ring
If the replication factor is inadequate or if the replicas are down, the output of the dsetool ring command contains a warning:
If the automatic Job Tracker or Spark Master election fails, verify that an appropriate replication factor is set for the dse_leases keyspace.Address DC Rack Workload Graph Status State Load Owns Token Health[0,1) -3074457345618258603 10.200.176.231 Analytics rack1 Analytics(TT) no Up Normal 221.96 KB 33.33% -9223372036854775808 0.98 10.200.176.232 Analytics rack1 Analytics(TT) no Down Down 139.05 KB 33.33% 3074457345618258602 0.0 10.200.176.233 Analytics rack1 Analytics(TT) no Up Normal 182.66 KB 33.33% -3074457345618258603 0.0 Warning: only 0 of the 1 replicas for the HadoopJT.Analytics lease are alive. Sparkmaster/JobTracker nodes will not be elected until a quorum of live nodes is achieved. Increasing the replication factor of the dse_leases keyspace in this datacenter to 3 will increase reliability. {10.200.176.232=DOWN}
- Use cqlsh commands to verify the replication factor of the analytics keyspaces
-
- Describe the dse_leases
keyspace:
cqlsh> DESCRIBE KEYSPACE dse_leases; CREATE KEYSPACE dse_leases WITH replication = {'class': 'NetworkTopologyStrategy', 'Analytics1': '1'} AND durable_writes = true;
- Increase the replication factor of the dse_leases
keyspace:
cqlsh> ALTER KEYSPACE dse_leases WITH replication = {'class': 'NetworkTopologyStrategy', 'Analytics1': '3', 'Analytics2':'3'} ;
- Run nodetool repair.
- Describe the dse_leases
keyspace:
The location of
the dse.yaml file depends
on the type of installation:
Installer-Services | /etc/dse/dse.yaml |
Package installations | /etc/dse/dse.yaml |
Installer-No Services | install_location/resources/dse/conf/dse.yaml |
Tarball installations | install_location/resources/dse/conf/dse.yaml |
The location
of the logback.xml file depends on the type of
installation:
Installer-Services and Package installations | /etc/dse/cassandra/logback.xml |
Installer-No Services and Tarball installations | install_location/resources/cassandra/conf/logback.xml |