Running the demo with external Mahout

Describes the steps to run the Mahout demo included with DSE on an external installation of Mahout.

The DataStax Enterprise installation includes a Mahout demo. The demo determines with some percentage of certainty which entries in the input data remained statistically in control and which have not. The input data is time series historical data. Using the Mahout algorithms, the demo classifies the data into categories based on whether it exhibited relatively stable behavior over a period of time. The demo produces a file of classified results. This procedure describes how to run the Mahout demo.

Procedure

Note: DataStax Demos do not work with either LDAP or internal authorization (username/password) enabled.

  1. Go to the Hadoop home directory and make the test data directory.
    cd Hadoop_home
    $ bin/hadoop fs -mkdir testdata
  2. Add the data from the demo directory to Mahout.
    bin/hadoop fs -put DSE_home/demos/mahout/synthetic_control.data testdata
  3. Go to the DSE home directory and run the demo's analysis job using byoh.
    bin/byoh mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job
    The job will take some time to complete. You can monitor the process of the job in OpsCenter if you have it installed.
  4. When the job completes, output the classified data into a file in a temporary location.
    bin/byoh mahout clusterdump --input output/clusters-0-final --pointsDir output/clusteredPoints --output /tmp/clusteranalyze.txt
  5. Open the /tmp/clusteranalyze.txt output data file and look at the results.