Using SparkR with DataStax Enterprise
DataStax Enterprise integrates SparkR to support creating data frames from DSE data.
Apache SparkR is a front-end for the R programming language for creating analytics applications. DataStax Enterprise integrates SparkR to support creating data frames from DSE data.
SparkR support in DSE requires you to first install R on the client machines on which you will be using SparkR. To use R user defined functions and distributed functions the same version of R should be installed on all the nodes in the Analytics cluster. DSE SparkR is built against R version 3.1.1. Many Linux distributions by default install older versions of R.
For example, on Debian and Ubuntu clients:
sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list' &&
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 &&
gpg -a --export E084DAB9 | sudo apt-key add - &&
sudo apt-get update &&
sudo apt-get install r-base
On RedHat and CentOS clients:
sudo yum install R
Starting SparkR
Start the SparkR shell using the dse command to automatically set the Spark session within R.
-
Start the R shell using the
dse
command.dse sparkR