Using SparkR with DataStax Enterprise

Apache SparkR is a front-end for the R programming language for creating analytics applications. DataStax Enterprise integrates SparkR to support creating data frames from DSE data.

Apache SparkR is a front-end for the R programming language for creating analytics applications. DataStax Enterprise integrates SparkR to support creating data frames from DSE data.

SparkR support in DSE requires you to first install R on the client machines on which you will be using SparkR. To use R user defined functions and distributed functions the same version of R should be installed on all the nodes in the Analytics cluster. DSE SparkR is built against R version 3.1.1. Many Linux distributions by default install older versions of R.

For example, on Debian and Ubuntu clients:

sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list' && 
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 && 
gpg -a --export E084DAB9 | sudo apt-key add - && 
sudo apt-get update && 
sudo apt-get install r-base

On RedHat and CentOS clients:

sudo yum install R

Starting SparkR

Start the SparkR shell using the dse command to automatically set the Spark session within R.

  1. Start the R shell using the dse command.
    dse sparkR