QuickStart Vertex and edge counting

Methods for counting vertices and edges in DataStax Graph.

About this task

There are different methods for accomplishing vertex and edge counts in DataStax Graph (DSG). Examples here will show how to use the Gremlin count() command as a transactional query. If large datasets are queried, analytical (OLAP) queries using DSE Analytics should be considered.

A transactional Gremlin query can be used to check the number of vertices that exist in the graph, and is useful for exploring small graphs. However, such a query scans the full graph, traversing every vertex, and should not be run on large graphs! If multiple DSE nodes are configured, this traversal step intensively walks all partitions on all nodes in the cluster that have graph data.

Remember, this method is not appropriate for large graphs or production operations.

An analytical Gremlin query can be used to check the number of vertices that exist in any graph, large or small, and are much safer for production operations. The queries will be written like transactional Gremlin queries, but executed with the analytic Spark engine. In Studio, use the execution button to select OLTP or OLAP. In Gremlin console, set the graph traversal source to OLAP before executing the query. DSE Analytics must be enabled on the cluster to use this option.

Vertex and edge counts can also be queried directly from the CQL tables for each vertex label and edge label, once the schema is defined.

As with all queries in Graph, if you are using Gremlin console, alias the graph traversal g to a graph with :remote config alias g food_qs.g before running any commands.

Procedure

  • Transactional Gremlin vertex count()

  • Use the traversal step count(). A graph traversal g is chained with V() to retrieve all vertices and count() to compute the number of vertices. Chaining executes sequential traversal steps in the most efficient order.

    g.V().count()

    In Studio, the result is:

    GSStudioVCountTwo

    In Gremlin console, the result is:

    ==>2

    An instructional warning will be thrown, advising that this command can result in long latency if run without assisting options.

    [warn] This traversal could read elements without a label restriction.
    This may degrade performance if many element labels are involved. Suggestions:
     - Add hasLabel steps to the traversal where vertices are read or edges traversed.  Examples:
         Instead of V(), use V().hasLabel('vertex_label')
         Instead of out(), use out('edge_label')
     - Suppress this warning by beginning the traversal by g.with("label-warning", false).
  • Transactional Gremlin edge counts

  • To do an edge count with Gremlin, replace V() with E():

    g.E().count()

    In Studio, the result is:

    GSStudioECountOne

    In Gremlin console, the result is:

    ==>1

    An instructional warning will be thrown, advising that this command can result in long latency if run without assisting options.

    [warn] This traversal could read elements without a label restriction.
    This may degrade performance if many element labels are involved. Suggestions:
     - Add hasLabel steps to the traversal where vertices are read or edges traversed.  Examples:
         Instead of V(), use V().hasLabel('vertex_label')
         Instead of out(), use out('edge_label')
     - Suppress this warning by beginning the traversal by g.with("label-warning", false).
  • Analytical Gremlin vertex count()

    Restriction: The following steps will only execute if DSE Analytics is enabled. Do not run if only Graph is enabled on the cluster.

  • To use Studio, configure the Run option to "Execute using analytic engine (Spark)" before running the query. A result similar to the tranactional query will result.

  • To use Gremlin console, first configure the traversal to run an analytical query:

    :remote config alias g food_qs.a

    where food_qs.a denotes that the graph will be used for analytic purposes. Then run the command:

    g.V().count()

    If DSE Analytics is not enabled, this query will fail.

  • Vertex and edge counts using CQL

  • Use the CQL statement SELECT count(*) to retrieve the count of either vertices or edges with an appropriate CQL table. For instance, retrieve the count of person vertices:

    SELECT count(*) FROM food_qs.person;

    In order to retrieve a count of all vertices, query each vertex label table.

     count
    -------
         1
    
    (1 rows)
  • Retrieve the count of person->authored->book edges:

    SELECT count(*) FROM food_qs.person__authored__book;

    In order to retrieve a count of all vertices, query each vertex label table.

    In Studio:

    GSStudioSelectPersonCQL

    In Gremlin console:

     count
    -------
         1
    
    (1 rows)

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com