Graph anti-patterns
Examine common mistakes made with DSE Graph.
Some common mistakes are made with DSE Graph. Examining best practices can ease the learning curve and improve graph application performance.
Not using indexing
g.V().has('name','James Beard')... requires the traversal to check
    all vertices that use the property key name. Changing this query
    to:g.V().has('author', 'name', 'James Beard')... allows the query to
    consult an index that can be built for all names in author records, and retrieve just one vertex
    to start the traversal. The index would be added during schema
    creation:schema.vertexLabel('author').index('byName').secondary().by('name').add()In fact, this one change in the traversal will change the query from an OLAP query into an OLTP query.
Property key creation
schema.propertyKey('recipeCreationDate').Timestamp().create()
schema.propertyKey('mealCreationDate').Timestamp().create()
schema.propertyKey('reviewCreationDate').Timestamp().create()While
    these property key names make code readable and ease tracking in graph traversals, each
    additional property key stored requires resources. Use one property key instead, such
    as:schema.propertyKey('timestamp').Timestamp().create() to decrease
    overhead. Since property keys are mostly used in graph traversals along with vertex labels,
     timestamp will be uniquely identified by the combination of vertex label and
    property key.Vertex label creation
schema.vertexLabel('recipeAuthor').create()
schema.vertexLabel('bookAuthor').create()
schema.vertexLabel('mealAuthor').create()
schema.vertexLabel('reviewAuthor').create()While
                these vertex labels again have the advantage of readability, unless a vertex label
                will be uniquely queried, it is best to roll the functionality into a single vertex
                label. For instance, in the above code, it is likely that recipes, meals, and books
                will have the same authors, whereas reviews are likely to have a different set of
                writers and types of queries. Use two vertex labels instead of
                four:schema.vertexLabel('author').create()
schema.vertexLabel('reviewer').create()
                In fact, this case may even be better suited to using only one vertex label
                    person, if the overlap in authors and reviewers is great
                enough. In some cases, a property key that identifies whether a
                    person is an author or a reviewer is a viable
                option.schema.propertyKey('type').Text().create()
schema.vertexLabel('person').property('type').create()
graph.addVertex(label, 'person', 'type', 'author', 'name', 'Jamie Oliver')Mixing schema creation or configuration setting with traversal queries
name with a value
     read vertex for all
    vertices.schema.config().option('graph.tx_groups.default.read_consistency').set('ALL');
g.V().has('name', 'read vertex').count()In
    Gremlin Server, both statements are run in one transaction. Any changes made during this
    transaction are applied when it successfully commits both actions. The change in read
    consistency is not actually applied until the end of a transaction and thereby only affects the
    next transaction. The statements are not processed sequentially as individual requests.To avoid such errors in processing, avoid mixing schema creation or configuration setting with traversal queries in applications. Best practice is to create schema and set configurations before querying the graph database with graph traversals.
InterruptedException indicates OLTP query running too long
In general, seeing logs with this exception are indicative that an OLTP query is running too long. The typical cause is that indexes have not been created for elements used in graph traversal queries. Create the indexes and retry the queries.
g.V().count() and g.E().count() can cause long delays
Running a count on a large graph can cause serious issues. The command basically must iterate through all the vertices, taking hours if the graph is large. Any table scan (iterating all vertices) is simply not an OLTP process. Doing the same process on edges is essentially the same, a full table scan, as well. Using Spark commands are currently the recommended method to get these counts.
Setting replication factor too low for graph_name_system
Each graph created in turn creates three DSE database keyspaces, graph_name, graph_name_system and graph_name_pvt. The graph_name_system stores the graph schema, and loss of this data renders the entire graph inoperable. Be sure to set the replication factor appropriately based on cluster configuration.
Using string concatenation in application instead of parameterized queries
String concatenation in graph applications will critically impair performance. Each unique query string creates an object that is cached on a node, using up node resources. Use parameterized queries (DSE Java Driver, DSE Python Driver, DSE Ruby Driver, DSE Node.js Driver, DSE C# Driver, DSE C/C++ Driver) to prevent problems due to resource allocation.
