TinkerPop API support in DseGraphFrame
DseGraphFrame supports a subset of the Apache TinkerPop traversal API.
DseGraphFrame
supports a subset of the Apache TinkerPop traversal API.
DseGraphFrame
does not support
org.apache.tinkerpop.gremlin.process.traversal.Traverser
or
org.apache.tinkerpop.gremlin.process.traversal.TraversalSideEffects
.
Supported methods
DseGraphFrame
mimics the TinkerPop graph traversal source by defining two
methods: E()
and V()
. These methods return a
GraphTraversal
that has all methods defined below. Only a limit set of
TinkerPop's Step
classes are supported. Steps other than the ones in the
following table will throw an UnsupportedException
.
Steps | Methods |
---|---|
CountGlobalStep |
count() |
GroupCountStep |
groupCount() |
IdStep |
id() |
PropertyValuesStep |
values() |
PropertyMapStep |
propertyMap() |
HasStep |
has() , hasLabel() |
IsStep |
is() |
VertexStep |
to() , out() , in() ,
both() , toE() , outE() ,
inE() , bothE() |
EdgeVertexStep |
toV() , inV() , outV() ,
bothV() |
NotStep |
not() |
TraversalFilterStep |
where() |
AndStep |
and(A,B) |
PageRankVertexProgramStep |
pageRank() |
DedupGlobalStep |
dedup() |
OrderGlobalStep |
order() |
LimitGlobalStep |
limit() |
SelectStep |
as() and select() |
OrStep |
or() |
This query finds people who know each other and demonstrates the as()
and
select()
methods:
g.V().as("a").out("knows").as("b").out("knows") .where(P.eq("a")).select("a", "b").by("name").show
+-----+-----+
| a | b |
+-----+-----+
|Alice| Bob |
| Bob |Alice|
+-----+-----+
Steps | Methods |
---|---|
DropStep |
V().drop() , E().drop() ,
properties().drop()
|
AddPropertyStep |
property(name, value, ...) |
DseGraphFrame
can be used to drop millions of vertices or edges at once, and
is much faster for bulk property updates than Gremlin OLAP or OLTP.
For example this query drops all person
vertices and their associated
edges:
g.V().hasLabel("person").drop().iterate()
Using DseGraphFrame in Scala
GraphTraversal
is a Java interface, and extends the Java
Iterator
interface. To iterate through the results of a traversal as a
DataFrame
use the df()
method. DseGraphFrame
supports implicit conversion to DataFrame
.
The following example will traverse the vertices of a graph using TinkerPop and then show the result as a DataFrame.
g.V().out().show
In some cases you may need to use the TinkerPop Java API to get the correct TinkerPop objects.
For example, to extract the DSE Graph Id
object the
Traversal
Java iterator can be converted to a Scala iterator which allows
direct access to the TinkerPop representation of the Id
. This method allows you
to use the original Id
instead of the DataFrame
methods which
return the DataFrame
String
representation of the Id
, you can also use the
toList()
and toSet()
methods to set the appropriate ID.
import scala.collection.JavaConverters._ for(i <-g.V().id().asScala) println (i)
{~label=vertex, community_id=748226688, member_id=0} {~label=custom, name=Name, value=1}
g.V.id.toSet
res18: java.util.Set[Object] = [{~label=demigod, community_id=224391936, member_id=0}, ...
The TinkerPop P (predicate) and T (constant) classes are imported by the Spark shell automatically.
g.E().groupCount().by(T.label) g.V().has("age", P.gt(30)).show
For standalone applications, import theses classes.
import org.apache.tinkerpop.gremlin.structure.T import org.apache.tinkerpop.gremlin.process.traversal.P import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__
Scala is not always able to infer the return type, especially in the Spark shell. The property values of the type should be provided explicitly.
g.V().values[Any]("name").next()
Or similarly:
val n: String = g.V().values("name").next()
Explicitly set the type when dropping properties.
g.V().properties[Any]("age", "name").drop().iterate()
In this case, using the DataFrame
API is easier as you do not need to specify
the type.
g.V().properties("age", "name").drop().show()
++ || ++ ++
g.V().values("age").show()
+-----+ | age| +-----+ |10000|
Method | Use case | Example |
---|---|---|
hasNext() |
You want to know if there's a result, but you don't care about the value. | Did Alice create any other
verticesg.V().has("name", "Alice").outE("created").hasNext() |
next() |
You know that there is at least 1 result and you want to get the first one (or the second if you call it twice, and so on). | Get the vertex label distribution. Group steps will always return exactly 1
result.g.V().groupCount().by(label).next() |
iterate() |
You just want to execute the traversal, but don't care about the result and whether it did anything at all. | Set all person's ages to
10.g.V().property("age", 10).iterate() |
toList() , toSet() |
You expect the result to contain an arbitrary number of items and you want to get all of them. | Get all the people Alice knows.
g.V().has("name", "Alice").out("knows").toList() |