DSE Analytics tools: dse commands and dsetool

Options for staring DataStax Enterprise.

You can issue the dse commands listed in this document from the bin directory of the DataStax Enterprise Linux installation or from the command line in a packaged or AMI distribution.

A dsetool utility for CassandraFS- and Hadoop-related tasks is also available for checking the CassandraFS and listing subranges in addition to managing the job tracker, discussed earlier.

DSE commands 

Synopsis
dse [-v ] | cassandra  [options ] | hadoop  [options ] | hive  [options ]
    | mahout  [options ] | pig  [options ] | sqoop  [options ]

This table describes the key dse commands:

Command Option Description Example
dse -v Send the DSE version number to standard output. none
dse cassandra   Start up a real-time Cassandra node in the background. link to example
dse cassandra -s Start up a DSE Search/Solr node in the background. link to example
dse cassandra -s -Ddse.solr.data.dir=path Use path to store Solr data. link to example
dse cassandra -t Start up an analytics node in the background. link to example
dse cassandra -t -j Start up an analytics node as the job tracker. link to example
dse cassandra -f Start up a real-time Cassandra node in the foreground. none
dse cassandra -f -t Start up an analytics node in the foreground. none
dse cassandra -f -s Start up a DSE Search/Solr node in the foreground. none
dse cassandra-stop -p pid Stop the DataStax Enterprise process number pid. link to example
dse cassandra -Dcassandra.replace_address After replacing a node, replace the IP address in the table. none
dse hadoop version Sends the version of the Hadoop component to standard output. none
dse hadoop fs options Invoke the Hadoop FileSystem shell. link to example
dse hadoop fs -help Send Apache Hadoop fs command descriptions to standard output. link to example
dse hive   Start the Hive client. link to example
dse hive --service name Start Hive by connecting through the JDBC driver. link to example
dse list_subranges options List subranges of data in a keyspace. link to documentation
dse mahout   Describe Mahout commands. link to example
dse mahout mahout command options Run the Mahout command. link to example
dse mahout hadoop hadoop command options Add Mahout classes to classpath and execute the hadoop command. link to example
dse pig   Start Pig. link to example
dse sqoop -help Send Apache Sqoop command line help to standard output. link to example

Hadoop, hive, mahout, and pig commands must be issued from an analytics node. The hadoop fs options, which DSE Analytics supports with one exception (-moveToLocal), are described in the HDFS File System Shell Guide on the Apache Hadoop web site. DSE Analytics has not yet implemented the -moveToLocal option, but you can use the -copyToLocal.

dsetool commands for Job Tracker management 

Usage: dsetool [-h|--host=<hostname>] [-p|--port=<#>] [-j|--jmxport=<#>] <command> <args>

This table describes the dsetool arguments:

Short form Long form Description
-h --host <arg> Node hostname or IP address
-j --jmxport <arg> Remote jmx agent port number
-p --port <arg> RPC agent port number
-t --transport <arg> RPC transport factory class
-u --use_hadoop_config Get cassandra host from hadoop configuration files

The dsetool commands are:

  • jobtracker - Return the JobTracker hostname and port, JT local to the DC from which you are running the command.
  • movejt - Move the JobTracker and notify the TaskTracker nodes.
  • listjt - List all JobTracker nodes grouped by DC local to them.
  • ring - List the nodes in the ring including their node type.
  • checkcfs - Check a single CFS file or the whole CFS.
  • repaircfs - Repair the CFS from orphan blocks.
  • rebuild_indexes <keyspace> <table-name> <idx1,idx2,...> - Rebuild specified secondary indexes for given keyspace/table. Use only keyspace/table-name to re-build all indexes.
  • list_subranges <keyspace> <cf-name> <keys_per_range> <start_token>, <end_token> - Divide a token range for a given keyspace/table into a number of smaller subranges of approximately keys_per_range. To be useful, the specified range should be contained by the target node's primary range.
  • partitioner - Return the fully qualified classname of the IPartitioner in use by the cluster

Examples of using dsetool commands for managing the Job Tracker are presented in Managing the job tracker using dsetool commands.

Checking the CassandraFS using dsetool 

Use the dsetool checkcfs command to scan the CassandraFS for corrupted files. For example:
dsetool checkcfs cfs:///
Use the dsetool to get details about a particular file that has been corrupted. For example:
dsetool checkcfs /tmp/myhadoop/mapred/system/jobtracker.info

Listing sub-ranges using dsetool 

The dsetool command syntax for listing subranges of data in a keyspace is:
dsetool [-h ] [hostname ] list_subranges keyspace table rows per subrange start token end token
  • rows per subrange is the approximate number of rows per subrange.
  • start partition range is the start range of the node.
  • end partition range is the end range of the node.
Note: You run nodetool repair on a single node using the output of list_subranges. The output must be partition ranges used on that node.
Example
dsetool list_subranges Keyspace1 Standard1 10000 113427455640312821154458202477256070485 0

Output

The output lists the subranges to use as input to the nodetool repair command. For example:
Start Token                             End Token                               Estimated Size
------------------------------------------------------------------------------------------------
113427455640312821154458202477256070485 132425442795624521227151664615147681247 11264
132425442795624521227151664615147681247 151409576048389227347257997936583470460 11136
151409576048389227347257997936583470460 0                                       11264

Nodetool repair command options

You need to use the nodetool utility when working with sub-ranges. The start partition range (-st) and end partition range (-et) options specify the portion of the node needing repair. You get values for the start and end tokens from the output of dsetool list_subranges command. The new nodetool repair syntax for using these options is:
nodetool repair keyspace table -st start token  -et end token
Example
nodetool repair Keyspace1 Standard1 -st 113427455640312821154458202477256070485 -et 132425442795624521227151664615147681247
nodetool repair Keyspace1 Standard1 -st 132425442795624521227151664615147681247 -et 151409576048389227347257997936583470460
nodetool repair Keyspace1 Standard1 -st 151409576048389227347257997936583470460 -et 0

These commands begins an anti-entropy node repair from the start partition range to the end partition range.