Supported syntax of Spark SQL

Spark SQL supports a subset of the SQL-92 language.

The following syntax defines a SELECT query.

SELECT [DISTINCT] [column names]|[wildcard] 
FROM [keyspace name.]table name 
[JOIN clause table name ON join condition]
[WHERE condition]
[GROUP BY column name]
[HAVING conditions]
[ORDER BY column names [ASC | DSC]]

A SELECT query using joins has the following syntax.

SELECT statement
FROM statement
[JOIN | INNER JOIN | LEFT JOIN | LEFT SEMI JOIN | LEFT OUTER JOIN | RIGHT JOIN | RIGHT OUTER JOIN | FULL JOIN | FULL OUTER JOIN]
ON join condition

Several select clauses can be combined in a UNION, INTERSECT, or EXCEPT query.

SELECT statement 1
[UNION | UNION ALL | UNION DISTINCT | INTERSECT | EXCEPT]
SELECT statement 2
Note: Select queries run on new columns return '', or empty results, instead of None.

The following syntax defines an INSERT query.

INSERT [OVERWRITE] INTO [keyspace name.]table name
VALUES values

The following syntax defines a CACHE TABLE query.

CACHE TABLE table name [AS table alias]

You can remove a table from the cache using a UNCACHE TABLE query.

UNCACHE TABLE table name

Keywords in Spark SQL

The following keywords are reserved in Spark SQL.

  • ALL
  • AND
  • AS
  • ASC
  • APPROXIMATE
  • AVG
  • BETWEEN
  • BY
  • CACHE
  • CAST
  • COUNT
  • DESC
  • DISTINCT
  • FALSE
  • FIRST
  • LAST
  • FROM
  • FULL
  • GROUP
  • HAVING
  • IF
  • IN
  • INNER
  • INSERT
  • INTO
  • IS
  • JOIN
  • LEFT
  • LIMIT
  • MAX
  • MIN
  • NOT
  • NULL
  • ON
  • OR
  • OVERWRITE
  • LIKE
  • RLIKE
  • UPPER
  • LOWER
  • REGEXP
  • ORDER
  • OUTER
  • RIGHT
  • SELECT
  • SEMI
  • STRING
  • SUM
  • TABLE
  • TIMESTAMP
  • TRUE
  • UNCACHE
  • UNION
  • WHERE
  • INTERSECT
  • EXCEPT
  • SUBSTR
  • SUBSTRING
  • SQRT
  • ABS