Use DSE geometric types in Apache Spark

About this task

DataStax Enterprise (DSE) geometric types can be used in DSE Spark. This example shows how to create a table containing geometric types using CQL, and then read and write to this table in the DSE Spark shell.

Procedure

  1. Start cqlsh.

    language-bash
    cqlsh
  2. Create the test keyspace for the geometric data.

    language-cql
    CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1 };
  3. Create the table to store the geometric data.

    language-cql
    CREATE TABLE IF NOT EXISTS test.geo (
        k INT PRIMARY KEY,
        pnt 'PointType',
        line 'LineStringType',
        poly 'PolygonType');
  4. Insert a test row.

    language-cql
    INSERT INTO test.geo (k, pnt, line, poly) VALUES
        (1,
        'POINT (1.1 2.2)',
        'LINESTRING (30 10, 10 30, 40 40)',
        'POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))');
  5. Exit cqlsh.

    language-cql
    QUIT;
  6. Start a Spark shell.

    language-bash
    dse spark
  7. Import the geometric types, type converters, and save modes.

    language-scala
    import com.datastax.driver.dse.geometry._
    import com.datastax.spark.connector.types.DseTypeConverter.{LineStringConverter, PointConverter, PolygonConverter}
    import org.apache.spark.sql.SaveMode
  8. Read the Point value from the test row in test.geo.

    language-scala
    val results = spark.read.cassandraFormat("geo","test").load().select("pnt").collect()
    val point1 = Point.fromWellKnownText(results(0).getString(0))
  9. Write a Polygon value to a new row in test.geo.

    language-scala
    val polygon1 = new Polygon(
        new Point(30, 10),
        new Point(40, 40),
        new Point(20, 40),
        new Point(10, 20),
        new Point(30, 10))
    val df = spark.createDataFrame(Seq((2, polygon1.toString))).select(col("_1") as "k", col("_2") as "poly")
    df.write.mode(SaveMode.Append).cassandraFormat("geo", "test").save()
  10. Check that the value was written to the test.geo table.

    language-scala
    spark.read.cassandraFormat("geo","test").load().show()
    results
    +---+--------------------+---------------+--------------------+
    |  k|                line|            pnt|                poly|
    +---+--------------------+---------------+--------------------+
    |  1|LINESTRING (30 10...|POINT (1.1 2.2)|POLYGON ((30 10, ...|
    |  2|                null|           null|POLYGON ((30 10, ...|
    +---+--------------------+---------------+--------------------+

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com