Explain indexes and how they affect DSE Graph performance.
DSE Graph implements two types of indexes, vertex-centric indexes and global indexes. Vertex-centric indexes are local and specific to a single vertex. Global indexes are specific to a vertex label and property and are graph-wide. All indexes contribute to the performance of graph traversals on large distributed graphs.
Vertex-centric indexes (VCI) are specific to a vertex, unlike global indexes which are global to the graph and index elements for fast global lookups. Vertex-centric indexes sort and index the incident edges and adjacent vertices of a vertex according to the incident edge labels or properties. When a vertex is queried, its index is consulted to avoid linear scans of all incident edges. Traversals can be reduced to O(1) or O(log n) from O(n). A typical graph traversal touches numerous vertices, compounding the cost of each incident edge scan if indexes are not consulted.
In DSE Graph, vertex-centric indexing is maintained as a materialized view (MV) table. When data is written or updated in the graph, the index is updated rather than utilizing the read repair functionality of Cassandra. A consequence of using a MV table is higher write latencies, but results in lower read latencies for graph traversals. Edge indexes and property indexes are vertex-centric indexes.
Indexes can affect traversal query performance. Decreasing the number of starting points for a graph traversal can greatly reduce the latency for a query result. If a traversal must start by checking all the vertices in a graph, time is lost finding the right starting point. If a starting vertex can be identified, that time is not required. Indexing the location of vertices based on the vertex label and property value improves the performance.
Global indexing uses the built-in indexing features of DSE. DSE Graph is stored in Cassandra, so two types of indexing available are based on Cassandra secondary indexing capability and materialized views. In addition, DSE Graph can take advantage of DSE Search for full text and string indexing. Global indexing can be accomplished with any of these three indexing types. The type selected depends on the data itself. The type of index lookup will also affect performance.
In general, low cardinality results in low selectivity, and high cardinality results in high selectivity. Searching by country is a good candidate for secondary indexing.
selectivity = ( cardinality / number of rows ) * 100%
Materialized view indexing uses Cassandra materialized views to store data. Materialized views are tables generated from a base table to provide a query based on a different primary key than the base table. This type of index is best used for values of high cardinality or nearly unique values or high selectivity. Searching materialized views yields similar response times to searching base tables, although writing the data incurs a small time penalty.
Search indexes rely on DSE Search. Since graph data is stored in Cassandra tables, one search core is available per vertex label. These indexes differ from secondary and materialized view indexes in their syntax due to this unique property. For each vertex label that will be indexed with search, all properties must be added to a single search index named search. Because search is implemented with DSE Search, two indexing options are available, full text and string. Full text indexing performs tokenization and secondary processing such as case normalization. Full text indexing is useful for queries where partial match of text is required. String indexing is useful for queries where an exact string is sought and no tokenization must be performed, such as for Solr faceting. This type of index is best for low selectivity, like secondary indexes.
Composite index keys are not currently supported in DSE Graph.
More than one index can be created on the same property, such as creating both a secondary
index and a search index on the property
amount. The DSE Graph query
optimizer will automatically use the appropriate index when processing a query. Different
index types may be created on different properties as appropriate, based on the