QuickStart Vertex and edge counting
Methods for counting vertices and edges in DataStax Graph.
About this task
There are different methods for accomplishing vertex and edge counts in DataStax Graph (DSG). Examples here will show how to use the Gremlin count() command as a transactional query. If large datasets are queried, analytical (OLAP) queries using DSE Analytics should be considered.
A transactional Gremlin query can be used to check the number of vertices that exist in the graph, and is useful for exploring small graphs. However, such a query scans the full graph, traversing every vertex, and should not be run on large graphs! If multiple DSE nodes are configured, this traversal step intensively walks all partitions on all nodes in the cluster that have graph data.
Remember, this method is not appropriate for large graphs or production operations. |
An analytical Gremlin query can be used to check the number of vertices that exist in any graph, large or small, and are much safer for production operations. The queries will be written like transactional Gremlin queries, but executed with the analytic Spark engine. In Studio, use the execution button to select OLTP or OLAP. In Gremlin console, set the graph traversal source to OLAP before executing the query. DSE Analytics must be enabled on the cluster to use this option.
Vertex and edge counts can also be queried directly from the CQL tables for each vertex label and edge label, once the schema is defined.
As with all queries in Graph, if you are using Gremlin console, alias the graph traversal g to a graph with |
Procedure
-
Transactional Gremlin vertex count()
-
Use the traversal step
count()
. A graph traversalg
is chained withV()
to retrieve all vertices andcount()
to compute the number of vertices. Chaining executes sequential traversal steps in the most efficient order.g.V().count()
In Studio, the result is:
In Gremlin console, the result is:
==>2
An instructional warning will be thrown, advising that this command can result in long latency if run without assisting options.
[warn] This traversal could read elements without a label restriction. This may degrade performance if many element labels are involved. Suggestions: - Add hasLabel steps to the traversal where vertices are read or edges traversed. Examples: Instead of V(), use V().hasLabel('vertex_label') Instead of out(), use out('edge_label') - Suppress this warning by beginning the traversal by g.with("label-warning", false).
-
Transactional Gremlin edge counts
-
To do an edge count with Gremlin, replace
V()
withE()
:g.E().count()
In Studio, the result is:
In Gremlin console, the result is:
==>1
An instructional warning will be thrown, advising that this command can result in long latency if run without assisting options.
[warn] This traversal could read elements without a label restriction. This may degrade performance if many element labels are involved. Suggestions: - Add hasLabel steps to the traversal where vertices are read or edges traversed. Examples: Instead of V(), use V().hasLabel('vertex_label') Instead of out(), use out('edge_label') - Suppress this warning by beginning the traversal by g.with("label-warning", false).
-
Analytical Gremlin vertex count()
Restriction: The following steps will only execute if DSE Analytics is enabled. Do not run if only Graph is enabled on the cluster.
-
To use Studio, configure the Run option to "Execute using analytic engine (Spark)" before running the query. A result similar to the tranactional query will result.
-
To use Gremlin console, first configure the traversal to run an analytical query:
:remote config alias g food_qs.a
where
food_qs.a
denotes that the graph will be used for analytic purposes. Then run the command:g.V().count()
If DSE Analytics is not enabled, this query will fail.
-
Vertex and edge counts using CQL
-
Use the CQL statement
SELECT count(*)
to retrieve the count of either vertices or edges with an appropriate CQL table. For instance, retrieve the count ofperson
vertices:SELECT count(*) FROM food_qs.person;
In order to retrieve a count of all vertices, query each vertex label table.
count ------- 1 (1 rows)
-
Retrieve the count of
person->authored->book
edges:SELECT count(*) FROM food_qs.person__authored__book;
In order to retrieve a count of all vertices, query each vertex label table.
In Studio:
In Gremlin console:
count ------- 1 (1 rows)