Using the Northwind demo graph with Spark OLAP jobs

The Northwind demo included with the DSE demos has a script for creating a graph of the data for a fictional trading company.

In this task, you’ll use the Gremlin console to create the Northwind graph, snapshot part of the graph, and run a count operation on the subgraph using the SparkGraphComputer.

  1. Enable DataStax Graph, DSE Search, and DSE Analytics modes in your datacenter.

  2. Install the DataStax Bulk Loader.

    The DataStax Graph Loader only supports classic DSE Graph (DSE 5.1 to 6.7). Classic DSE Graph has been replaced by DataStax Graph in DataStax Enterprise (DSE) 6.8. DataStax recommends using DataStax Bulk Loader for loading graph data in DSE 6.8 and later.

  3. Clone the graph-examples repository to the machine on which you are running the Gremlin console.

    git clone https://github.com/datastax/graph-examples.git
  4. Use the DataStax Bulk Loader to load the sample graph data from the repository.

  5. Start the Gremlin console:

    dse gremlin-console
  6. Alias the traversal to Northwind graph using the default OLTP traversal source:

    :remote config alias g northwind.g
  7. Set the schema mode to Development.

    To allow modifying the schema for the connected graph database, you must set the mode to Development each session. The default schema mode for DataStax Graph is Production, which doesn’t allow you to modify the graph’s schema.

    schema.config().option('graph.schema_mode').set('Development')
  8. Enable the use of scans and lambdas.

    schema.config().option('graph.allow_scan').set('true')
    graph.schema().config().option('graph.traversal_sources.g.restrict_lambda').set(false)
  9. Look at the schema of the northwind graph:

    schema.describe()
  10. Alias the traversal to the Northwind analytics OLAP traversal source a. Alias g to the OLAP traversal source for one-off analytic queries:

    :remote config alias g northwind.a
    ==>g=northwind.a
  11. Count the number of vertices using the OLAP traversal source:

    g.V().count()
    ==>3294

    When you alias g to the OLAP traversal source database name.a, DSE Analytics is the workload back-end.

  12. Store subgraphs into snapshots using graph.snapshot().

    When you need to run multiple OLAP queries on a graph in one session, use snapshots of the graph as the traversal source.

    employees = graph.snapshot().vertices('employee').create()
    ==>graphtraversalsource[hadoopgraph[persistedinputrdd->persistedoutputrdd], sparkgraphcomputer]
    categories = graph.snapshot().vertices('category').create()
    ==>graphtraversalsource[hadoopgraph[persistedinputrdd->persistedoutputrdd], sparkgraphcomputer]

    The snapshot() method returns an OLAP traversal source using the SparkGraphComputer.

  13. Run an operation on the snapshot graphs.

    Count the number of employee vertices in the snapshot graph:

    employees.V().count()
    ==> 9

    Count the number of category vertices in the snapshot graph:

    categories.V().count()
    ==> 8

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com