Apache TinkerPop™ API support in DseGraphFrame

DseGraphFrame supports a subset of the Apache TinkerPop traversal API.

DseGraphFrame does not support org.apache.tinkerpop.gremlin.process.traversal.Traverser or org.apache.tinkerpop.gremlin.process.traversal.TraversalSideEffects.

Supported methods

DseGraphFrame mimics the TinkerPop graph traversal source by defining two methods: E() and V(). These methods return a GraphTraversal that has all methods defined below. Only a limited set of the TinkerPop Step classes are supported. Steps other than the ones in the following table will throw an UnsupportedException.

Steps Methods

Steps	Methods
`CountGlobalStep`	`count()`
`GroupCountStep`	`groupCount()`
`IdStep`	`id()`
`PropertyValuesStep`	`values()`
`PropertyMapStep`	`propertyMap()`
`HasStep`	`has()`, `hasLabel()`
`IsStep`	`is()`
`VertexStep`	`to()`, `out()`, `in()`, `both()`, `toE()`, `outE()`, `inE()`, `bothE()`
`EdgeVertexStep`	`toV()`, `inV()`, `outV()`, `bothV()`
`NotStep`	`not()`
`TraversalFilterStep`	`where()`
`AndStep`	`and(A,B)`
`PageRankVertexProgramStep`	`pageRank()`
`DedupGlobalStep`	`dedup()`
`OrderGlobalStep`	`order()`
`LimitGlobalStep`	`limit()`
`SelectStep`	`as()` and `select()`
`OrStep`	`or()`

CountGlobalStep

count()

GroupCountStep

groupCount()

IdStep

id()

PropertyValuesStep

values()

PropertyMapStep

propertyMap()

HasStep

has(), hasLabel()

IsStep

is()

VertexStep

to(), out(), in(), both(), toE(), outE(), inE(), bothE()

EdgeVertexStep

toV(), inV(), outV(), bothV()

NotStep

not()

TraversalFilterStep

where()

AndStep

and(A,B)

PageRankVertexProgramStep

pageRank()

DedupGlobalStep

dedup()

OrderGlobalStep

order()

LimitGlobalStep

limit()

SelectStep

as() and select()

OrStep

or()

Examples

This query finds people who know each other and demonstrates the as() and select() methods:

g.V().as("a").out("knows").as("b").out("knows")
  .where(P.eq("a")).select("a", "b").by("name").show

+-----+-----+
|  a  |  b  |
+-----+-----+
|Alice| Bob |
| Bob |Alice|
+-----+-----+

Steps Methods

Steps	Methods
`DropStep`	`V().drop()`, `E().drop()`, `properties().drop()`
`AddPropertyStep`	`property(name, value, ...)`

DropStep

V().drop(), E().drop(), properties().drop()

AddPropertyStep

property(name, value, ...)

DseGraphFrame can be used to drop millions of vertices or edges at once, and is much faster for bulk property updates than Gremlin OLAP or OLTP.

For example this query drops all person vertices and their associated edges:

g.V().hasLabel("person").drop().iterate()

Using DseGraphFrame in Scala

GraphTraversal is a Java interface, and extends the Java Iterator interface. To iterate through the results of a traversal as a DataFrame use the df() method. DseGraphFrame supports implicit conversion to DataFrame.

The following example will traverse the vertices of a graph using TinkerPop and then show the result as a DataFrame.

g.V().out().show

In some cases you may need to use the TinkerPop Java API to get the correct TinkerPop objects.

For example, to extract the DataStax Graph Id object the Traversal Java iterator can be converted to a Scala iterator which allows direct access to the TinkerPop representation of the Id. This method allows you to use the original Id instead of the DataFrame methods which return the DataFrame String representation of the Id, you can also use the toList() and toSet() methods to set the appropriate ID.

import scala.collection.JavaConverters._
for(i <-g.V().id().asScala) println (i)

{~label=vertex, community_id=748226688, member_id=0}
{~label=custom, name=Name, value=1}

g.V.id.toSet

res18: java.util.Set[Object] = [{~label=demigod, community_id=224391936, member_id=0}, ...

The TinkerPop P (predicate) and T (constant) classes are imported by the Apache Spark™ shell automatically.

g.E().groupCount().by(T.label)
g.V().has("age", P.gt(30)).show

For standalone applications, import theses classes:

import org.apache.tinkerpop.gremlin.structure.T
import org.apache.tinkerpop.gremlin.process.traversal.P
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__

Scala is not always able to infer the return type, especially in the Spark shell. The property values of the type should be provided explicitly:

g.V().values[Any]("name").next()

Or similarly:

val n: String = g.V().values("name").next()

Explicitly set the type when dropping properties:

g.V().properties[Any]("age", "name").drop().iterate()

In this case, using the DataFrame API is easier as you do not need to specify the type:

g.V().properties("age", "name").drop().show()

Result

++
||
++
++

g.V().values("age").show()

Result

+-----+
|  age|
+-----+
|10000|
+-----+

Result retrieval methods

hasNext()

You want to know if there’s a result, but you don’t care about the value.

For example, "did Alice create any other vertices?":

g.V().has("name", "Alice").outE("created").hasNext()

next()

You know that there is at least one result, and you want to get the first one. If called multiple times, it gets the next consecutive result.

For example, if you want to get the vertex label distribution:

g.V().groupCount().by(label).next()

Group steps will always return exactly one result.

iterate()

You want to execute the traversal, and you don’t care about the result or whether it did anything at all.

For example, to set all persons' ages to 10:

g.V().property("age", 10).iterate()

toList() and toSet()

You expect the result to contain an arbitrary number of items, and you want to get all of them.

For example, get all the people Alice knows:

g.V().has("name", "Alice").out("knows").toList()

Apache TinkerPop™ API support in DseGraphFrame

Supported methods

Examples

Using DseGraphFrame in Scala

Result retrieval methods

Was this helpful?

Give Feedback