Using Mahout

DataStax Enterprise integrates Apache Mahout, a Hadoop component that offers machine learning libraries.

DataStax Enterprise integrates Apache Mahout, a Hadoop component that offers machine learning libraries. Mahout facilitates building intelligent applications that learn from data and user input. Machine learning use cases are many and some, such as the capability of web sites to recommend products to visitors based on previous visits, are notorious.

Currently, Mahout jobs that use Lucene features are not supported.

Running the Mahout demo 

The DataStax Enterprise installation includes a Mahout demo. The demo determines with some percentage of certainty which entries in the input data remained statistically in control and which have not. The input data is time series historical data. Using the Mahout algorithms, the demo classifies the data into categories based on whether it exhibited relatively stable behavior over a period of time. The demo produces a file of classified results. This procedure describes how to run the Mahout demo.

Procedure

Note: DataStax Demos do not work with either LDAP or internal authorization (username/password) enabled.

  1. After installing DataStax Enterprise, start an analytics node.
  2. Go to the demos/mahout directory.
    The default location of the demos/mahout directory depends on the type of installation:
    Installer-Services and Package installations /usr/share/dse/demos/mahout
    Installer-No Services and Tarball installations install_location/demos/mahout
  3. Run the script in the demos directory. For example, on Linux:
    ./run_mahout_example.sh

    If you are running OpsCenter, you can now view the Hadoop job progress:



    When the demo completes, a message appears on the standard output about the location of the output file. For example:

    The output is in /tmp/clusteranalyze.txt