Graph

The DSE Graph service processes graph queries written in the Gremlin language. Session#execute_graph and Session#execute_graph_async are responsible for transmitting graph queries to DSE graph. The response is a graph result set, which may contain domain object representations of graph objects.

A script using the DSE driver to execute graph queries will typically begin like this:

require 'dse'

# Connect to DSE and create a session whose graph queries will be tied to the graph
# named 'mygraph' by default. See the documentation for Dse::Graph::Options for all
# supported graph options.
cluster = Dse.cluster(graph_name: 'mygraph')
session = cluster.connect

The DSE driver is a wrapper around the core Cassandra driver, so any valid options to the core driver are valid in the DSE driver as well. This includes specifying execution profiles.

Execution Profiles

Execution profiles were introduced in v3.1.0 of the Cassandra driver to group together a set of options for executing queries. The DSE driver provides the Dse::Graph::ExecutionProfile class to encapsulate graph options and core execution profile attributes. Graph execution profile attributes, like their Cassandra driver counterpart, fall back to system default values when not specified:

  • load_balancing_policy: LoadBalancing::Policies::TokenAware.new(LoadBalancing::Policies::DCAwareRoundRobin.new, true)
  • retry-policy: Retry::Policies::Default.new
  • consistency: :local_one

The exception to this rule is that the timeout defaults to 30 seconds, while the Cassandra system default is 12.

Most of the graph options default to nil and defer to values in server-side configuration. The following options, however, do have client-side defaults: * graph_source: g * graph_language: gremlin-groovy

When querying these options in an execution profile, nil is returned. However, at request execution time, the appropriate defaults are sent in the request payload.

The DSE driver initializes the following three default graph execution profiles:

  • :default_graph - used by default by Session#execute_graph*
    • timeout: 30 because graph queries tend to run longer than CQL queries.
  • :default_graph_system - useful when running system queries.
    • timeout: 180 because system queries typically run longer than ordinary graph queries since they are mutating the schema and must synchronize with multiple nodes.
  • :default_graph_analytics - useful for analytics queries
    • timeout: 604800, which is 7 days because analytics queries can run for a very long time
    • graph_source: a because analytics queries must run against the ‘analytics’ source.
    • load_balancing_policy: Dse::LoadBalancing::Policies::HostTargeting.new(<token-aware-dc-aware-round-robin>), which gives priority to the analytics master node. Analytics queries must run on the analytics master, and without this policy, the client would likely send requests to a different (coordinator) node and that node would have to forward the request to the analytics master. Thus, this load-balancing policy saves a network hop.

Unspecified options above fall back to the system defaults. If you define your own :default_graph* profiles, you must take care to set the options you want to override as well as options that you’d like to keep unchanged from the internally-built :default_graph* profile, since you don’t want your profile to fall back to system defaults (e.g. load-balancing-policy for the :default_graph_analytics profile).

Graph options specified to Dse#cluster are stored in the :default_graph graph execution profile. Thus, without specifying any execution-profile parameters to Dse#cluster, the resulting profiles will often be as you want them. In the above example, we want graph queries to interact with graph mygraph most of the time. In addition, if the graph_name graph option is specified, it will be set into the :default_graph_analytics profile.

Just as in the core driver, execution profiles provide default options for executing queries. Specify overrides as options to Session#execute_graph:

session.execute_graph('g.V()', graph_name: 'starmap',
                      execution_profile: :default_graph_analytics)

Define entirely new profiles when creating the cluster object. Profile names should be strings or symbols, though they can be any type of object that has a reliable hashcode because ultimately the name becomes a key in a hash.

cluster = Dse.cluster(
  execution_profiles: {
    test1: Dse::Graph::ExecutionProfile.new(graph_source: 'g', timeout: 5),
    'test2' => Dse::Graph::ExecutionProfile.new(graph_source: 'a', timeout: 12)
  }
)
session = cluster.connect
result = session.execute_graph('g.V()', execution_profile: :test1)
result2 = session.execute_graph('g.V()', execution_profile: 'test2')

Note that it is illegal specify execution_profiles and the above-mentioned primitive options when initializing the cluster. Define the :default and :default_graph* profiles as appropriate when you want to change default behavior and define your own profiles.

Expert Options

In an effort to make the DSE driver compatible with future versions of DataStax Enterprise Graph, graph execution profiles also support specifying arbitrary key-value options. These “expert options” presumably exist in an as-yet unreleased version of DataStax Enterprise. To leverage this feature, simply supply an expert_options hash when creating your execution profile or execution your graph statement:

profile = Dse::Graph::ExecutionProfile.new(graph_source: 'g', expert_options: {'some_option' => 'some_value'})

session.execute_graph('g.V()', expert_options: {'some_option' => 'some_value'})

Vertices

Vertices in DSE Graph have properties. A property may have multiple values. This is represented as an array when manipulating a Vertex object. A property value may also have properties of their own (known as meta-properties). These meta-properties are simple key-value pairs of strings; they do not nest.

# Run a query to get all the vertices in our graph.
results = session.execute_graph('g.V()')

# Each result is a Dse::Graph::Vertex.
# Print out the label and a few of its properties.
puts "Number of vertex results: #{results.size}"
results.each do |v|
   # Start with the label
   puts "#{v.label}:"

   # Vertex properties support multiple values as well as meta-properties
   # (simple key-value attributes that apply to a given property's value).
   #
   # Emit the 'name' property's first value.
   puts "  name: #{v.properties['name'][0].value}"

   # Name again, using our abbreviated syntax
   puts "  name: #{v['name'][0].value}"

   # Print all the values of the 'name' property
   values = v['name'].map do |vertex_prop|
     vertex_prop.value
   end
   puts "  all names: #{values.join(',')}"

   # That's a little inconvenient. So use the 'values' shortcut:
   puts "  all names: #{v['name'].values.join(',')}"

   # Let's get the 'title' meta-property of 'name's first value.
   puts "  title: #{v['name'][0].properties['title']}"

   # This has a short-cut syntax as well:
   puts "  title: #{v['name'][0]['title']}"
end

Edges

Edges connect a pair of vertices in DSE Graph. They also have properties, but they are simple key-value pairs of strings.

results = session.execute_graph('g.E()')

puts "Number of edge results: #{results.size}"
# Each result is a Dse::Graph::Edge object.
results.each do |e|
   # Start with the label
   puts "#{e.label}:"

   # Now the id's of the two vertices that this edge connects.
   puts "  in id: #{e.in_v}"
   puts "  out id: #{e.out_v}"

   # Edge properties are simple key-value pairs; sort of like
   # meta-properties on vertices.

   puts "  edge_prop1: #{e.properties['edge_prop1']}"

   # This supports the short-cut syntax as well:
   puts "  edge_prop1: #{e['edge_prop1']}"
end

Path and Arbitrary Objects

Paths describe a path between two vertices. The graph response from DSE does not indicate that the response is a path, so the driver cannot automatically coerce such results into Path objects. The driver returns a DSE::Graph::Result object in such cases, and you can coerce the result.

results = session.execute_graph('g.V().in().path()')
puts "Number of path results: #{results.size}"
results.each do |r|
  # The 'value' of the result is a hash representation of the JSON result.
  puts "first label: #{r.value['labels'].first}"

  # Since we know this is a Path result, coerce it and use the Path object's methods.
  p = r.as_path
  puts "first label: #{p.labels.first}"
end

When a query has a simple result, the :value attribute of the result object contains the simple value rather than a hash.

results = session.execute_graph('g.V().count()')
puts "Number of vertices: #{results.first.value}"

Duration Graph Type

DSE Graph supports several datatypes for properties. The Duration type represents a duration of time. When DSE Graph returns properties of this type, the string representation is non-trivial and requires parsing in order for the user to really gain any information from it.

The driver includes a helper class to parse such responses from DSE graph as well as to send such values in bound paramters in requests:

# Create a Duration property in the schema called 'runtime' and declare that 'process' vertices can have this property.
session.execute_graph(
    "schema.propertyKey('runtime').Duration().ifNotExists().create();
      schema.propertyKey('name').Text().ifNotExists().create();
      schema.vertexLabel('process').properties('name', 'runtime').ifNotExists().create()")

# We want to record that a process ran for 1 hour, 2 minutes, 3.5 seconds.
runtime = Dse::Graph::Duration.new(0, 1, 2, 3.5)
session.execute_graph(
    "graph.addVertex(label, 'process', 'name', 'calculator', 'runtime', my_runtime);",
    arguments: {'my_runtime' => runtime})

# Now retrieve the vertex. Assume this is the only vertex in the graph for simplicity. 
v = session.execute_graph('g.V()').first
runtime = Dse::Graph::Duration.parse(v['runtime'].first.value)
puts "#{runtime.hours} hours, #{runtime.minutes} minutes, #{runtime.seconds} seconds"

Miscellaneous Features

There are a number of other features in the api to make development easier.

# We can access particular items in the result-set via array dereference
p results[1]

# Run a query against a different graph, but don't mess with the cluster default.
results = session.execute_graph('g.V().count()', graph_name: 'my_other__graph')

# Set an "expert" option for which we don't have a proper graph option.
# NOTE: Such options are not part of the public api and may change in a future
# release of DSE.
results = session.execute_graph('g.V().count()', graph_name: 'my_other__graph',
                                                 expert_options: {'super-cool-option' => 'value'})

# Create a statement object encapsulating a graph query, options, parameters,
# for ease of reuse.
statement = Dse::Graph::Statement.new('g.V().limit(n)', {n: 3}, graph_name: 'mygraph')
results = session.execute_graph(statement)