Apache TinkerPop graph computing framework
Describe the Apache TinkerPop framework.
Apache TinkerPop is a graph abstraction layer that works with numerous different graph databases and graph processors. Apache TinkerPop is composed of two elements: a structure API and a process API.
The primary components of the Apache TinkerPop structure API are:
- Graph
- maintains a set of vertices and edges
- Vertex
- extends a general class Element and maintains a set of incoming and outgoing edges as well as a collection of properties and a vertex type
- Edge
- extends Element and maintains an incoming and outgoing vertex as well as a collection of properties and an edge type
- Property
- a string key associated with a value
- VertexProperty
- a string key associated with a value as well as a collection of metadata properties (vertices only)
The primary components of the Apache TinkerPop process API are:
- TraversalSource
- a generator of traversals for a particular graph, domain specific language (DSL), and execution engine
- Traversal<S,E>
- a functional data flow process transforming objects of type
S
into object of typeE
- GraphTraversal
- a traversal DSL that is oriented towards the semantics of the raw graph (i.e. vertices, edges, etc.)
- GraphComputer
- a system that processes the graph in parallel and potentially, distributed over a multi-machine cluster
- VertexProgram
- code executed at all vertices in a logically parallel manner with intercommunication via message passing
- MapReduce
- computations that analyzes all vertices in the graph in parallel and yields a single reduced result
A key feature of Apache TinkerPop is Gremlin, a graph traversal language and virtual machine. Apache TinkerPop and Gremlin are to graph databases what JDBC and SQL are to relational databases. Gremlin variants are available for many languages: Java, Groovy, Python, and others.