Using Apache SparkR™ with DataStax Enterprise
Apache SparkR is a front-end for the R programming language for creating analytics applications. DataStax Enterprise integrates SparkR to support creating data frames from DSE data.
SparkR support in DSE requires you to first install R on the client machines that are using SparkR. To use R user defined functions and distributed functions the same version of R should be installed on all the nodes in the Analytics cluster. DSE SparkR is built against R version 3.1.1. Many Linux distributions by default install older versions of R.
For example, on Debian and Ubuntu clients:
sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list'
$ gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
$ gpg -a --export E084DAB9 | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install r-base
On RedHat and CentOS clients:
sudo yum install R
Starting SparkR
Start the SparkR shell using dse SparkR
to automatically set the Spark session within R.
-
Start the R shell using the
dse
command.dse sparkR