Run the Apache Spark MLlib demo application

The Apache Spark™ MLlib demo application demonstrates how to run machine-learning analytic jobs using Apache Spark and DSE. The demo solves the classic iris flower classification problem, using the iris flower dataset. The application will use the iris flower dataset to build a Naive Bayes classifier that will recognize a flower based on four feature measurements.

Prerequisites

We strongly recommend that you install the BLAS library on your machines before running Spark MLlib jobs.

The BLAS library is not distributed with DSE due to licensing restrictions, but improves MLlib performance significantly.

You must have the Gradle build tool installed to build the demo. See https://gradle.org/ for details on installing Gradle on your OS.

Procedure

Start the nodes in Analytics mode.
- Package installations: See Start DataStax Enterprise as a service.
- Tarball installations:See Start DataStax Enterprise as a standalone process.
In a terminal, go to the spark-mlib directory located in the Spark demo directory.

The default location of the Spark demo depends on the type of installation:
- Package installations: /usr/share/dse/demos/portfolio_manager
- Tarball installations: installation_location/demos/portfolio_manager
Build the application using the gradle build tool.
```
gradle
```
Use spark-submit to submit the application JAR.

The Spark MLlib demo application reads the Spark demo directory/spark-mllib/iris.csv file on each node. This file must be accessible in the same location on each node. If some nodes do not have the same local file path, set up a shared network location accessible to all the nodes in the cluster.

To run the application where each node has access to the same local location of iris.csv.
```
dse spark-submit NaiveBayesDemo.jar
```
To specify a shared location of iris.csv:
```
dse spark-submit NaiveBayesDemo.jar /mnt/shared/iris.csv
```

Run the Apache Spark MLlib demo application

Prerequisites

Procedure

Was this helpful?

Give Feedback