Running HiveQL queries using Apache Spark™ SQL

Spark SQL supports queries written using HiveQL, a SQL-like language that produces queries that are converted to Spark jobs. HiveQL is more mature and supports more complex queries than Spark SQL. To construct a HiveQL query, first create a new HiveContext instance, and then submit the queries by calling the sql method on the HiveContext instance.

See the Hive Language Manual for the full syntax of HiveQL.

Creating indexes with DEFERRED REBUILD is not supported in Spark SQL.

Procedure

  1. Start the Spark shell.

    bin/dse spark
  2. Use the provided HiveContext instance sqlContext to create a new query in HiveQL by calling the sql method on the sqlContext object.

    scala> val results = sqlContext.sql("SELECT * FROM my_keyspace.my_table")

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com