DSE Graph configuration
Configure DSE Graph.
Adjusting DSE Graph configuration can create an environment easier to use for development, while protecting and improving the performance for a production environment. Some configurations affect the interaction of applications with the graph database, while others affect internal processing within DSE. In addition, securing DSE Graph has important consequences, and a number of configuration settings can secure cluster operation. Whether doing development or implementing production, a thorough knowledge of the configuration is vital.
General DataStax Graph (DSG) settings
Settings that affect DSG core functionality.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
dse.yaml Graph options
DSG stores cluster-wide options for DSE Graph in
dse.yaml under the graph:
and gremlin-server:
keys.
Most of the options that are common to modify have been discussed in the sections below. Of
particular note, the Graph sandbox is
configured in the Gremlin Server options of the dse.yaml file. This feature is enabled
by default and provides protection from malicious attacks within the JVM.
To modify dse.yaml settings, modify the file on each node in the cluster and restart each node. Settings in the dse.yaml are node system level in scope. The dse.yaml files can also be modified using OpsCenter.
remote.yaml Gremlin console options
The remote.yaml file is the primary configuration file for DSG Gremlin console connections to the Gremlin Server. Most options are self-explanatory. In particular, be aware that if you are using analytic OLAP queries with DSG, changes are required in this file.
Replication factor
The replication factor (RF) for a graph can affect the performance of reads and writes in DSG. Just as for the DSE database, these factors control the number of replicas of data that the distributed graph database will store across multiple nodes.
One keyspaces are created for each graph. The replication factor is set when a graph is created.
Graph consistency levels
Consistency level in DSG is controlled with graph traversal
options and used for read and write operations on a traversal basis. Gremlin queries
execute CQL commands to insert, read, and update graph data via traversals, and so the DSE
database consistency level settings can affect the execution of graph operations. The
consistency level for reads or writes are set per graph with the with('consistency',
cl_level)
used for both reads and writes.
DSE Graph security settings
Settings that affect DSE Graph security.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
Graph sandbox and whitelisted/blacklisted code
gremlin-server:
key, is enabled by default. This
security feature prevents malicious code execution in the JVM that could harm a DSE
instance. Sandbox rules are defined to both blacklist (disallow execution) and
whitelist (allow execution) packages, superclasses and types. For Java/Groovy code
entered in the Gremlin console, only the specified allowed operations will execute.
The default sandbox rules may be
overridden in the dse.yaml file. The sandbox rules are applied
in the following order:- blacklist_supers, including all classes that implement or extend the listed items
- blacklist_packages, including all sub-packages
- whitelist_packages, including all sub-packages
- whitelist_types, not including sub-classes, but only the specified type
- whitelist_supers, including all classes that implement or extend the listed items
- java.lang.System: All methods other than currentTimeMillis and nanoTime are blocked (blacklisted).
- java.lang.Thread: currentThread().isInterrupted is an allowed method that can return a wrapped thread with toString, and sleep is another allowed method, and all other methods are disallowed.
gremlin_server
section of the dse.yaml file:
gremlin_server: port: 8182 threadPoolWorker: 2 gremlinPool: 0 scriptEngines: gremlin-groovy: config: # sandbox_enabled: false sandbox_rules: whitelist_packages: - org.apache.tinkerpop.gremlin.process - java.nio whitelist_types: - java.lang.String - java.lang.Boolean - com.datastax.bdp.graph.spark.SparkSnapshotBuilderImpl - com.datastax.dse.graph.api.predicates.Search whitelist_supers: - groovy.lang.Script - java.lang.Number - java.util.Map - org.apache.tinkerpop.gremlin.process.computer.GraphComputer blacklist_packages: - java.io - org.apache.tinkerpop.gremlin.structure.io - org.apache.tinkerpop.gremlin.groovy.jsr223 - java.nio.channels
The Fluent API restricts the allowable operations to secure execution, but uses the sandbox to enable lambda functions.
Authentication, authorization, and encryption
DSE can authenticate or authorize access by users, secure the stored data with encryption, or secure Gremlin console with SSL, based on Graph vertex labels or graphs, as applicable.
DSE Graph security is managed by DSE security. As noted in this topic, you can modify
the Graph Sandbox by
customizing the gremlin-server:
key of the
dse.yaml file.
To configure the DSE Graph Gremlin console connection to the Gremlin Server, customize the remote.yaml file for your environment.
DSE Graph also supports auditing using DSE auditing; for details, refer to Setting up database auditing.
Restrict lambda
restrict_lambda
(default: true) value. DataStax Graph (DSG) traversal performance settings
Settings that affect DSG traversal performance.
dev mode
Use the dev
traversal source while you are experimenting in early
DSG use and design. It allows you to query without creating any indexing, a distinct advantage
for exploring your data and data model.
Graph traversal with() options
Use the with()
traversal
options to finetune the performance of graph queries. At various stages of using DSG,
different options are more or less useful. For instance, similar to CQL, an option to
allow-filtering
can be useful early in design, but harmful when large datasets
are inserted into the database. Familiarize yourself with the capabilities of the options
available.
Timeouts
Timeout settings can cause failure of DSG in a variety of ways, both client-side and
server-side. On the client-side, commands from the Gremlin console can time out before reaching
the Gremlin server. Issuing the command :remote config timeout none
in the
Gremlin console allows the default maximum timeout of 3 minutes to be overridden with no time
limit. Any request typed into the Gremlin console is sent to the Gremlin Server, and the console
waits for a response before it aborts the request and returns control to the user. If the
timeout is changed to none, the request will never timeout. This can be useful if the time to
send a request to the server and get a return is taking longer than the default timeout, for
complex traversals or large datasets.
On the server-side, the cluster-wide timeout settings,
realtime_evaluation_timeout_in_seconds
(default: 30 seconds) or
analytic_evaluation_timeout_in_minutes
(default: 1008 minutes), are the
maximum time to wait for a traversal to evaluate for OLTP or OLAP traversals, respectively.
These settings are found in the dse.yaml file. If the timeout behavior for traversal evaluation
needs to be overridden for a particular graph, evaluation_timeout
can be set on
a graph-by-graph basis, to override either the OLTP or OLAP traversal evaluation timeout. If
complex traversals are timing out during execution, changing an appropriate timeout setting
should fix the error.
An additional server-side setting that can be adjusted in the dse.yaml file is
schema_agreement_timeout_in_ms
(30 seconds), the maximum time to wait for
schema versions to agree across a cluster when making schema changes. If a large schema is
submitted to a cluster, especially with indexes defined, this setting may need adjustment before
data is submitted to the graph.
Finally, in the dse.yaml file, system_evaluation_timeout_in_seconds
(default: 180 seconds) is defined as the maximum time to wait for a graph system request to
evaluate. Creating or dropping a graph is a system request affected by this setting, which does
not interact with the other timeout options.
Timeout | Default | Impact |
---|---|---|
:remote config timeout none | 3 minutes | Lengthen if command transit from Gremlin console to Gremlin Server is timing out. |
realtime_evaluation_timeout_in_seconds | 30 seconds | Lengthen if the OLTP traversal evaluation is timing out. |
analytic_evaluation_timeout_in_minutes | 1008 minutes | Lengthen if the OLAP traversal evaluation is timing out. |
evaluation_timeout | N/A | Set per-graph to override OLTP or OLAP traversal evaluation timeout. |
schema_agreement_timeout_in_ms | 30 seconds | Lengthen if a large schema is submitted, especially with indexes. |
system_evaluation_timeout_in_seconds | 180 seconds | Lengthen if graph system requests are not completing. |