Connecting to DataStax Enterprise using the Spark shell on an external Spark cluster

Use the generated byos.properties configuration file and the byos-version.jar from a DataStax Enterprise node to connect to the DataStax Enterprise cluster from the Spark shell on an external Spark cluster.

Where is the clients directory?

The default location of the clients directory depends on the type of installation:

Installation Type Location

Package installations + Installer-Services installations

/usr/share/dse/clients

Tarball installations + Installer-No Services installations

<installation_location>/clients

Prerequisites

You must generate the byos.properties on a node in your DataStax Enterprise cluster.

Procedure

  1. Copy the byos.properties file you previously generated from the DataStax Enterprise node to the local Spark node.

    scp user@dsenode1.example.com:~/byos.properties .

    If you are using Kerberos authentication, specify the --generate-token and --token-renewer <username> options when generating byos.properties, as described in dse client-tool configuration byos-export.

  2. Copy the byos-version.jar file from the clients directory from a node in your DataStax Enterprise cluster to the local Spark node.

    The byos-version.jar file location depends on the type of installation.

    scp user@dsenode1.example.com:/usr/share/dse/clients/dse-byos_2.10-5.0.1-5.0.0-all.jar byos-5.0.jar
  3. Merge external Spark properties into byos.properties.

    cat ${SPARK_HOME}/conf/spark-defaults.conf >> byos.properties
  4. (Optional) If you are using Kerberos authentication, set up a CRON job or other task scheduler to periodically call dse client-tool cassandra renew-token <token> where <token> is the encoded token string in byos.properties.

  5. Start the Spark shell using the byos.properties and byos-version.jar file.

    spark-shell --jars byos-5.0.jar --properties-file  byos.properties

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com