Indexing graph overview 

Explain indexes and how they affect DSE Graph performance.

DSE Graph implements two types of indexes, vertex-centric indexes and global indexes. Vertex-centric indexes are local and specific to a single vertex. Global indexes are specific to a vertex label and property and are graph-wide. All indexes contribute to the performance of graph traversals on large distributed graphs.

Vertex-centric indexing (VCI) overview 

Vertex-centric indexes (VCI) are specific to a vertex, unlike global indexes which are global to the graph and index elements for fast global lookups. Vertex-centric indexes sort and index the incident edges and adjacent vertices of a vertex according to the incident edge labels or properties. When a vertex is queried, its index is consulted to avoid linear scans of all incident edges. Traversals can be reduced to O(1) or O(log n) from O(n). A typical graph traversal touches numerous vertices, compounding the cost of each incident edge scan if indexes are not consulted.

In DSE Graph, vertex-centric indexing is maintained as a materialized view (MV) table. When data is written or updated in the graph, the index is updated rather than utilizing the read repair functionality of Cassandra. A consequence of using a MV table is higher write latencies, but results in lower read latencies for graph traversals. Edge indexes and property indexes are vertex-centric indexes.

Vertex-centric indexing also plays a role in solving the super-node issue. A super-node is a vertex that has an exponentially larger number of incident edges. The example generally given is to compare the number of followers that reader has to those of a celebrity - hundreds or thousands of followers, compared to millions of followers. A graph traversal checking the index of a super-node will take an outsized amount of time just to read the index table. Using vertex-centric indexing in conjunction with a partitioned vertex table (PVT), the index can be stored on multiple partitions and distributed across the DSE cluster. For vertices that have in excess of one million edges, graph partitioning is necessary due to the storage limitations of a single Cassandra table. Distributing the index tables also enables better response to graph traversals.
Note: Partitioned vertex tables are an experimental feature for handling supernodes. However, data modeling techniques are currently a better avenue for mitigating supernode issues.

Global indexing overview 

Indexes can affect traversal query performance. Decreasing the number of starting points for a graph traversal can greatly reduce the latency for a query result. If a traversal must start by checking all the vertices in a graph, time is lost finding the right starting point. If a starting vertex can be identified, that time is not required. Indexing the location of vertices based on the vertex label and property value improves the performance.

Global indexing uses the built-in indexing features of DSE. DSE Graph is stored in Cassandra, so two types of indexing available are based on Cassandra secondary indexing capability and materialized views. In addition, DSE Graph can take advantage of DSE Search for full text and string indexing. Global indexing can be accomplished with any of these three indexing types. The type selected depends on the data itself. The type of index lookup will also affect performance.

Secondary indexing in DSE Graph follows the same rule of thumb as secondary indexing in Cassandra. This type of index is meant for lower cardinality values, alternatively, for low selectivity values. Selectivity is derived from cardinality, using the following formula:
selectivity = ( cardinality / number of rows ) * 100%
In general, low cardinality results in low selectivity, and high cardinality results in high selectivity. Searching by country is a good candidate for secondary indexing.

Materialized view indexing uses Cassandra materialized views to store data. Materialized views are tables generated from a base table to provide a query based on a different primary key than the base table. This type of index is best used for values of high cardinality or nearly unique values or high selectivity. Searching materialized views yields similar response times to searching base tables, although writing the data incurs a small time penalty.

Search indexes rely on DSE Search. Since graph data is stored in Cassandra tables, one search core is available per vertex label. These indexes differ from secondary and materialized view indexes in their syntax due to this unique property. For each vertex label that will be indexed with search, all properties must be added to a single search index named search. Because search is implemented with DSE Search, two indexing options are available, full text and string. Full text indexing performs tokenization and secondary processing such as case normalization. Full text indexing is useful for queries where partial match of text is required. String indexing is useful for queries where an exact string is sought and no tokenization must be performed, such as for Solr faceting. This type of index is best for low selectivity, like secondary indexes.

Composite index keys are not currently supported in DSE Graph.

Indexing practices 

More than one index can be created on the same property, such as creating both a secondary index and a search index on the property amount. The DSE Graph query optimizer will automatically use the appropriate index when processing a query. Different index types may be created on different properties as appropriate, based on the selectivity.