What's new in DataStax Graph

What's new in DataStax Graph.

DataStax Graph (DSG) 6.8 implements many new changes to simplify working with graph data. Classic Graph (DSE Graph 6.7 and earlier) is still available in DataStax Graph 6.8, but the new core features make both working in Graph with Gremlin and Cassandra Query Language (CQL) straightforward.

Additions

DSG has improved and detailed error messages, especially for queries requiring absent indexes, containing:
  • the executed traversal that failed
  • details of the step that failed
  • the CQL that was executed and failed
  • an index suggestion to apply to fulfill the traversal
  • alternative approaches to creating an index
gremlin> g.V().hasLabel("a").has("age", 23)
One or more indexes are required to execute the traversal: g.V().hasLabel("a").has("age",(int) 23)
Failed step: __.V().hasLabel("a").has("age",(int) 23)
CQL execution: No table or view could satisfy the query 'SELECT * FROM bla.a WHERE age = ?'
The output of 'schema.indexFor(<your_traversal>).analyze()' suggests the following indexes could be created to allow execution:

schema.vertexLabel('a').materializedView('a_by_age').ifNotExists().partitionBy('age').clusterBy('id', Asc).create()

Alternatively consider using:
g.with('ignore-unindexed') to ignore unindexed traversal. Your results may be incomplete.
g.with('allow-filtering') to allow filtering. This may have performance implications.

DSG includes an index analyzer, indexFor() that can make suggestions for indexes required to complete a query. A query is supplied, and the index analyzer will return either a suggested index or indicate that an index already exists that fulfills the requirements.

DSG has improved the Schema API to simplify the creation and modification of vertex and edge labels, as well as indexes. The Schema API more closely aligns to Cassandra terminology with specified partition keys and clustering columns that correspond to CQL PRIMARY KEY.

DataStax Graph uses a more transparent data model where a graph is a CQL keyspace, vertex and edge labels are stored in CQL tables in a 1:1 correspondence, and the properties of vertex and edge labels are stored in CQL columns. Thus, users can store existing CQL data, but also have a keyspace treated as a graph, to perform graph traversals. CQL grammar can be used to specify the keyspace and table metadata required to perform graph queries.

DSG data types have been aligned with CQL, and all types including collections, tuples, and user-defined types (UDTs) are supported.

The DataStax Bulk Loader, CQL, or GraphFrames can be used to ingest data.

The profile() output details the steps of a given traversal and the CQL each step uses to execute. CQL statements are grouped and include duration information. The improved format is helpful for troubleshooting.

If edges are indexed, they can be queried directly rather than via a vertex, like in Classic Graph. Search indexes can also be used on edges, and is particularly useful for tokenized edge queries, but also any other predicate supported by search.

DSG supports a dev traversal source that allows queries to be performed without indexing. This method uses full graph scans, but is useful for early exploration of a graph.

Deprecations

Configuration options such as allow_scan / schema_mode / evaluation_timeout were completely removed. The only traversal configuration that DSG supports are documented are the with options.

Classic Engine still supports all settings in dse.yaml. However, for DSG, the following settings are supported:

  • analytic_evaluation_timeout_in_ms: Maximum time to wait for an OLAP analytic (Spark) traversal to evaluate
  • realtime_evaluation_timeout_in_ms: Maximum time to wait for an OLTP real-time traversal to evaluate
  • system_evaluation_timeout_in_ms: Maximum time to wait for a graph-system request to evaluate. Creating/dropping a new graph is an example of a graph-system request
  • gremlin_server: Different options that configure the Gremlin server
Transaction handling has been changed:
  • tx() graph API calls will throw an unsupported operation exception.
  • All mutations are executed once a traversal has been exhausted. There are no guarantees that this will not result in partial commits in the event of node failure.
  • Mutations are no longer visible during the execution of a traversal. For instance: g.addV('person').V() will not return a vertex.

Multi- and meta-property support has been dropped. Other mechanisms replace these elements, now that DSG is more closely aligned with Apache Cassandra and Cassandra Query Language. Row-level and roles-based access control can be used to control access that was previously implemented by meta-properties. Collections and tuples can be used to store complex data that was previously stored in both multi- and meta-properties.

The graph API was removed; graph.addVertex() and addEdge() cannot be used. Use the graph traversal API instead, with g.addV() and g.addE(). In addition, elements that are returned from traversals are reference elements and only return the primary key. Use valueMap().by(unfold()) to retrieve any other data with a traversal.

In DSE Graph 6.7 and earlier, all edges were bidirectional by default. For performance reasons, edges are now always created as unidirectional. Thus, for traversals that traverse in the opposite direction from the original schema, a materialized view index must be created on the edge label. The index analyzer, indexFor() can facilitate the index creation, or the inverse() step can be used to create the index.

In DSE Graph 6.7 and earlier, graph and vertex queries could be cached. This option was removed in DSG.

TTL support via schema can only be set using CQL in DSG.

DSG does not support external ID construction and IDs must be obtained directly from elements if they are to be used for lookups.

Lambda functions are no longer supported.

The DataStax Graphloader is deprecated and not supported in DSG except for Classic use. Users can ingest data using CQL or a bulk loading tool like GraphFrames or DataStax Bulk Loader.