Supported syntax of Spark SQL
Spark SQL supports a subset of the SQL-92 language.
The following syntax defines a SELECT
query.
SELECT [DISTINCT] [column names]|[wildcard]
FROM [keyspace name.]table name
[JOIN clause table name ON join condition]
[WHERE condition]
[GROUP BY column name]
[HAVING conditions]
[ORDER BY column names [ASC | DSC]]
A SELECT
query using joins has the following syntax.
SELECT statement
FROM statement
[JOIN | INNER JOIN | LEFT JOIN | LEFT SEMI JOIN | LEFT OUTER JOIN | RIGHT JOIN | RIGHT OUTER JOIN | FULL JOIN | FULL OUTER JOIN]
ON join condition
Several select clauses can be combined in a UNION
,
INTERSECT
, or EXCEPT
query.
SELECT statement 1
[UNION | UNION ALL | UNION DISTINCT | INTERSECT | EXCEPT]
SELECT statement 2
Note: Select queries run on new columns return
''
, or empty results, instead
of None.The following syntax defines an INSERT
query.
INSERT [OVERWRITE] INTO [keyspace name.]table name
VALUES values
The following syntax defines a CACHE TABLE
query.
CACHE TABLE table name [AS table alias]
You can remove a table from the cache using a UNCACHE TABLE
query.
UNCACHE TABLE table name
Keywords in Spark SQL
The following keywords are reserved in Spark SQL.
- ALL
- AND
- AS
- ASC
- APPROXIMATE
- AVG
- BETWEEN
- BY
- CACHE
- CAST
- COUNT
- DESC
- DISTINCT
- FALSE
- FIRST
- LAST
- FROM
- FULL
- GROUP
- HAVING
- IF
- IN
- INNER
- INSERT
- INTO
- IS
- JOIN
- LEFT
- LIMIT
- MAX
- MIN
- NOT
- NULL
- ON
- OR
- OVERWRITE
- LIKE
- RLIKE
- UPPER
- LOWER
- REGEXP
- ORDER
- OUTER
- RIGHT
- SELECT
- SEMI
- STRING
- SUM
- TABLE
- TIMESTAMP
- TRUE
- UNCACHE
- UNION
- WHERE
- INTERSECT
- EXCEPT
- SUBSTR
- SUBSTRING
- SQRT
- ABS