What’s new in DataStax Graph
DataStax Graph (DSG) 6.8 implements many new changes to simplify working with graph data. Classic Graph (DSE Graph 6.7 and earlier) is still available in DataStax Graph 6.8, but the new core features make both working in Graph with Gremlin and Cassandra Query Language (CQL) straightforward.
DSG has improved and detailed error messages, especially for queries requiring absent indexes, containing:
the executed traversal that failed
details of the step that failed
the CQL that was executed and failed
an index suggestion to apply to fulfill the traversal
alternative approaches to creating an index
gremlin> g.V().hasLabel("a").has("age", 23) One or more indexes are required to execute the traversal: g.V().hasLabel("a").has("age",(int) 23) Failed step: __.V().hasLabel("a").has("age",(int) 23) CQL execution: No table or view could satisfy the query 'SELECT * FROM bla.a WHERE age = ?' The output of 'schema.indexFor(<your_traversal>).analyze()' suggests the following indexes could be created to allow execution: schema.vertexLabel('a').materializedView('a_by_age').ifNotExists().partitionBy('age').clusterBy('id', Asc).create() Alternatively consider using: g.with('ignore-unindexed') to ignore unindexed traversal. Your results may be incomplete. g.with('allow-filtering') to allow filtering. This may have performance implications.
DSG includes an index analyzer,
indexFor() that can make suggestions for indexes required to complete a query.
A query is supplied, and the index analyzer will return either a suggested index or indicate that an index already exists that fulfills the requirements.
DSG has improved the Schema API to simplify the creation and modification of vertex and edge labels, as well as indexes.
The Schema API more closely aligns to Cassandra terminology with specified partition keys and clustering columns that correspond to CQL
DataStax Graph uses a more transparent data model where a graph is a CQL keyspace, vertex and edge labels are stored in CQL tables in a 1:1 correspondence, and the properties of vertex and edge labels are stored in CQL columns. Thus, users can store existing CQL data, but also have a keyspace treated as a graph, to perform graph traversals. CQL grammar can be used to specify the keyspace and table metadata required to perform graph queries.
DSG data types have been aligned with CQL, and all types including collections, tuples, and user-defined types (UDTs) are supported.
The DataStax Bulk Loader, CQL, or GraphFrames can be used to ingest data.
profile() output details the steps of a given traversal and the CQL each step uses to execute.
CQL statements are grouped and include duration information.
The improved format is helpful for troubleshooting.
If edges are indexed, they can be queried directly rather than via a vertex, like in Classic Graph. Search indexes can also be used on edges, and is particularly useful for tokenized edge queries, but also any other predicate supported by search.
DSG supports a
dev traversal source that allows queries to be performed without indexing.
This method uses full graph scans, but is useful for early exploration of a graph.
Configuration options such as
evaluation_timeout were completely removed.
The only traversal configuration that DSG supports are documented are the with options.
Classic Engine still supports all settings in
However, for DSG, the following settings are supported:
analytic_evaluation_timeout_in_ms: Maximum time to wait for an OLAP analytic (Spark) traversal to evaluate
realtime_evaluation_timeout_in_ms: Maximum time to wait for an OLTP real-time traversal to evaluate
system_evaluation_timeout_in_ms: Maximum time to wait for a graph-system request to evaluate. Creating/dropping a new graph is an example of a graph-system request
gremlin_server: Different options that configure the Gremlin server
Transaction handling has been changed:
tx()graph API calls will throw an unsupported operation exception.
All mutations are executed once a traversal has been exhausted. There are no guarantees that this will not result in partial commits in the event of node failure.
Mutations are no longer visible during the execution of a traversal. For instance:
g.addV('person').V()will not return a vertex.
Multi- and meta-property support has been dropped. Other mechanisms replace these elements, now that DSG is more closely aligned with Apache Cassandra and Cassandra Query Language. Row-level and roles-based access control can be used to control access that was previously implemented by meta-properties. Collections and tuples can be used to store complex data that was previously stored in both multi- and meta-properties.
The graph API was removed;
addEdge() cannot be used.
Use the graph traversal API instead, with
In addition, elements that are returned from traversals are reference elements and only return the primary key.
valueMap().by(unfold()) to retrieve any other data with a traversal.
In DSE Graph 6.7 and earlier, all edges were bidirectional by default.
For performance reasons, edges are now always created as unidirectional.
Thus, for traversals that traverse in the opposite direction from the original schema, a materialized view index must be created on the edge label.
The index analyzer,
indexFor() can facilitate the index creation, or the
inverse() step can be used to create the index.
In DSE Graph 6.7 and earlier, graph and vertex queries could be cached. This option was removed in DSG.
TTL support via schema can only be set using CQL in DSG.
DSG does not support external ID construction and IDs must be obtained directly from elements if they are to be used for lookups.
Lambda functions are no longer supported.
The DataStax Graphloader is deprecated and not supported in DSG except for Classic use. Users can ingest data using CQL or a bulk loading tool like GraphFrames or DataStax Bulk Loader.