dse client-tool spark

Perform operations related to integrated Spark.

Synopsis

$ dse client-tool connection_options spark
(master-address | leader-address | version |
sql-schema (--exclude | --keyspace | --table | --decimal | --all)
metastore-migrate --from_version --to_version)
Syntax conventions
Syntax conventions Description

UPPERCASE

Literal keyword.

Lowercase

Not literal.

Italics

Variable value. Replace with a valid option or user-defined value.

[ ]

Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square brackets.

( )

Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

|

Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.

...

Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.

'Literal string'

Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.

{ key:value }

Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.

<datatype1,datatype2>

Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.

cql_statement;

End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ]

Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

' <schema> …​ </schema> '

Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type'

Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

leader-address

Returns the IP address of the currently selected Spark Master for the datacenter.

master-address

Returns the address used to configure Spark applications. The address is returned as URI: dse://ip:port?connection.local_dc=dc_name;connection.host=cs_list_contactpoints

The connection.host=cs_list_contactpoints option is a comma separated list of IP addresses of additional contact points. The additional contact points are up to five randomly selected nodes from the datacenter.

DSE automatically connects Spark applications to the Spark Master. You do not need to use the IP address of the current Spark Master in the connection URI.

metastore-migrate --from_version --to_version

Migrate Spark SQL metastore from one DSE version to another DSE version.

  • --from_version - the version to migrate metastore from

  • --to_version - the version to migrate metastore to

version

Returns the version of Spark that is bundled with DataStax Enterprise.

sql-schema (--exclude|--keyspace|--table|--decimal|--all)

Exports the SQL table creation query with these options:

  • --table tablename - comma-separated list of tables to include

  • --exclude csvlist - comma-separated list of tables to exclude

  • --all - includes all keyspaces

  • --keyspace csvlist - comma-separated list of keyspaces to include

Examples

View the Spark connection URL for this datacenter:

$ dse client-tool spark master-address
dse://10.200.181.62:9042?connection.local_dc=Analytics;connection.host=10.200.181.63

View the IP address of the current Spark Master in this datacenter:

$ dse client-tool spark leader-address 10.200.181.62

Generate Spark SQL schema files

You can use the generated schema files with Spark SQL on external Spark clusters.

$ dse client-tool --use-server-config spark sql-schema --all > output.sql

Migrate Spark metastore

To map custom external tables from DSE 5.0.11 to the DSE 6.0.0 release format of the Hive metastore used by Spark SQL after upgrading:

$ dse client-tool spark metastore-migrate --from 5.0.11 --to 6.0.0

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com