Graph
The DSE Graph service processes graph queries written in the Gremlin language.
Session#execute_graph
and Session#execute_graph_async
are responsible for transmitting graph queries to DSE graph.
The response is a graph result set, which may contain domain object representations of graph objects.
A script using the DSE driver to execute graph queries will typically begin like this:
require 'dse'
# Connect to DSE and create a session whose graph queries will be tied to the graph
# named 'mygraph' by default. See the documentation for Dse::Graph::Options for all
# supported graph options.
cluster = Dse.cluster(graph_name: 'mygraph')
session = cluster.connect
The DSE driver is a wrapper around the core Cassandra driver, so any valid options to the core driver are valid in the DSE driver as well. This includes specifying execution profiles.
Execution Profiles
Execution profiles were introduced in v3.1.0 of the Cassandra driver to group together a set of options for executing
queries. The DSE driver provides the Dse::Graph::ExecutionProfile
class to encapsulate graph options and core
execution profile attributes. Graph execution profile attributes, like their Cassandra driver counterpart, fall back to
system default values when not specified:
- load_balancing_policy:
LoadBalancing::Policies::TokenAware.new(LoadBalancing::Policies::DCAwareRoundRobin.new, true)
- retry-policy:
Retry::Policies::Default.new
- consistency:
:local_one
The exception to this rule is that the timeout defaults to 30 seconds, while the Cassandra system default is 12.
Most of the graph options default to nil
and defer to values in server-side configuration. The following options,
however, do have client-side defaults:
* graph_source: g
* graph_language: gremlin-groovy
When querying these options in an execution profile, nil
is returned. However, at request execution time, the
appropriate defaults are sent in the request payload.
The DSE driver initializes the following three default graph execution profiles:
-
:default_graph
- used by default bySession#execute_graph*
- timeout:
30
because graph queries tend to run longer than CQL queries.
- timeout:
-
:default_graph_system
- useful when running system queries.- timeout:
180
because system queries typically run longer than ordinary graph queries since they are mutating the schema and must synchronize with multiple nodes.
- timeout:
-
:default_graph_analytics
- useful for analytics queries- timeout:
604800
, which is 7 days because analytics queries can run for a very long time - graph_source:
a
because analytics queries must run against the ‘analytics’ source. - load_balancing_policy:
Dse::LoadBalancing::Policies::HostTargeting.new(<token-aware-dc-aware-round-robin>)
, which gives priority to the analytics master node. Analytics queries must run on the analytics master, and without this policy, the client would likely send requests to a different (coordinator) node and that node would have to forward the request to the analytics master. Thus, this load-balancing policy saves a network hop.
- timeout:
Unspecified options above fall back to the system defaults. If you define your own :default_graph*
profiles, you
must take care to set the options you want to override as well as options that you’d like to keep unchanged from the
internally-built :default_graph*
profile, since you don’t want your profile to fall back to system defaults
(e.g. load-balancing-policy for the :default_graph_analytics
profile).
Graph options specified to Dse#cluster
are stored in the :default_graph
graph execution
profile. Thus, without specifying any execution-profile parameters to Dse#cluster
, the resulting profiles will
often be as you want them. In the above example, we want graph queries to interact with graph mygraph
most of the
time. In addition, if the graph_name
graph option is specified, it will be set into the :default_graph_analytics
profile.
Just as in the core driver, execution profiles provide default options for executing queries. Specify overrides as
options to Session#execute_graph
:
session.execute_graph('g.V()', graph_name: 'starmap',
execution_profile: :default_graph_analytics)
Define entirely new profiles when creating the cluster
object. Profile names should be strings or symbols, though
they can be any type of object that has a reliable hashcode because ultimately the name becomes a key in a hash.
cluster = Dse.cluster(
execution_profiles: {
test1: Dse::Graph::ExecutionProfile.new(graph_source: 'g', timeout: 5),
'test2' => Dse::Graph::ExecutionProfile.new(graph_source: 'a', timeout: 12)
}
)
session = cluster.connect
result = session.execute_graph('g.V()', execution_profile: :test1)
result2 = session.execute_graph('g.V()', execution_profile: 'test2')
Note that it is illegal specify execution_profiles and the above-mentioned primitive options when initializing the
cluster. Define the :default
and :default_graph*
profiles as appropriate when you want to change default behavior
and define your own profiles.
Expert Options
In an effort to make the DSE driver compatible with future versions of DataStax Enterprise Graph, graph execution
profiles also support specifying arbitrary key-value options. These “expert options” presumably exist in an as-yet
unreleased version of DataStax Enterprise. To leverage this feature, simply supply an expert_options
hash when
creating your execution profile or execution your graph statement:
profile = Dse::Graph::ExecutionProfile.new(graph_source: 'g', expert_options: {'some_option' => 'some_value'})
session.execute_graph('g.V()', expert_options: {'some_option' => 'some_value'})
Vertices
Vertices in DSE Graph have properties. A property may have multiple values. This is represented as an array when manipulating a Vertex object. A property value may also have properties of their own (known as meta-properties). These meta-properties are simple key-value pairs of strings; they do not nest.
# Run a query to get all the vertices in our graph.
results = session.execute_graph('g.V()')
# Each result is a Dse::Graph::Vertex.
# Print out the label and a few of its properties.
puts "Number of vertex results: #{results.size}"
results.each do |v|
# Start with the label
puts "#{v.label}:"
# Vertex properties support multiple values as well as meta-properties
# (simple key-value attributes that apply to a given property's value).
#
# Emit the 'name' property's first value.
puts " name: #{v.properties['name'][0].value}"
# Name again, using our abbreviated syntax
puts " name: #{v['name'][0].value}"
# Print all the values of the 'name' property
values = v['name'].map do |vertex_prop|
vertex_prop.value
end
puts " all names: #{values.join(',')}"
# That's a little inconvenient. So use the 'values' shortcut:
puts " all names: #{v['name'].values.join(',')}"
# Let's get the 'title' meta-property of 'name's first value.
puts " title: #{v['name'][0].properties['title']}"
# This has a short-cut syntax as well:
puts " title: #{v['name'][0]['title']}"
end
Edges
Edges connect a pair of vertices in DSE Graph. They also have properties, but they are simple key-value pairs of strings.
results = session.execute_graph('g.E()')
puts "Number of edge results: #{results.size}"
# Each result is a Dse::Graph::Edge object.
results.each do |e|
# Start with the label
puts "#{e.label}:"
# Now the id's of the two vertices that this edge connects.
puts " in id: #{e.in_v}"
puts " out id: #{e.out_v}"
# Edge properties are simple key-value pairs; sort of like
# meta-properties on vertices.
puts " edge_prop1: #{e.properties['edge_prop1']}"
# This supports the short-cut syntax as well:
puts " edge_prop1: #{e['edge_prop1']}"
end
Path and Arbitrary Objects
Paths describe a path between two vertices. The graph response from DSE does not indicate that the response is a path, so the driver cannot automatically coerce such results into Path objects. The driver returns a DSE::Graph::Result object in such cases, and you can coerce the result.
results = session.execute_graph('g.V().in().path()')
puts "Number of path results: #{results.size}"
results.each do |r|
# The 'value' of the result is a hash representation of the JSON result.
puts "first label: #{r.value['labels'].first}"
# Since we know this is a Path result, coerce it and use the Path object's methods.
p = r.as_path
puts "first label: #{p.labels.first}"
end
When a query has a simple result, the :value attribute of the result object contains the simple value rather than a hash.
results = session.execute_graph('g.V().count()')
puts "Number of vertices: #{results.first.value}"
Duration Graph Type
DSE Graph supports several datatypes for properties. The Duration type represents a duration of time. When DSE Graph returns properties of this type, the string representation is non-trivial and requires parsing in order for the user to really gain any information from it.
The driver includes a helper class to parse such responses from DSE graph as well as to send such values in bound paramters in requests:
# Create a Duration property in the schema called 'runtime' and declare that 'process' vertices can have this property.
session.execute_graph(
"schema.propertyKey('runtime').Duration().ifNotExists().create();
schema.propertyKey('name').Text().ifNotExists().create();
schema.vertexLabel('process').properties('name', 'runtime').ifNotExists().create()")
# We want to record that a process ran for 1 hour, 2 minutes, 3.5 seconds.
runtime = Dse::Graph::Duration.new(0, 1, 2, 3.5)
session.execute_graph(
"graph.addVertex(label, 'process', 'name', 'calculator', 'runtime', my_runtime);",
arguments: {'my_runtime' => runtime})
# Now retrieve the vertex. Assume this is the only vertex in the graph for simplicity.
v = session.execute_graph('g.V()').first
runtime = Dse::Graph::Duration.parse(v['runtime'].first.value)
puts "#{runtime.hours} hours, #{runtime.minutes} minutes, #{runtime.seconds} seconds"
Miscellaneous Features
There are a number of other features in the api to make development easier.
# We can access particular items in the result-set via array dereference
p results[1]
# Run a query against a different graph, but don't mess with the cluster default.
results = session.execute_graph('g.V().count()', graph_name: 'my_other__graph')
# Set an "expert" option for which we don't have a proper graph option.
# NOTE: Such options are not part of the public api and may change in a future
# release of DSE.
results = session.execute_graph('g.V().count()', graph_name: 'my_other__graph',
expert_options: {'super-cool-option' => 'value'})
# Create a statement object encapsulating a graph query, options, parameters,
# for ease of reuse.
statement = Dse::Graph::Statement.new('g.V().limit(n)', {n: 3}, graph_name: 'mygraph')
results = session.execute_graph(statement)